Abstract:
Techniques for processing data sets and, more particularly, constructing a synthetic data set (test data set) from real data sets (input data sets) in accordance with user feedback. The technique mimics real data sets effectively to generate the corresponding synthetic ones. Multiple real data sets may be used to create a test data set which combines the characteristics of these multiple data sets. Users of the technique have the ability to modify the characteristics of the data sets to create a new data set which has features that a user may desire. For example, a user may change the shape or size of, or distort the different patterns in the data to create a new data set. A user may also choose to inject noise into the system.
Abstract:
Techniques are provided for incorporating human or user interaction in accordance with the design and/or performance of data mining applications such as similarity determination and classification. Such user-centered techniques permit the mining of interesting characteristics of data in a data or feature space. For example, such interesting characteristics that may be determined in accordance with the user-centered mining techniques of the invention may include a determination of similarity among different data objects, as well the determination of individual class labels. These techniques allow effective data mining applications to be performed in accordance with high dimensional data.
Abstract:
A method for mining incomplete data sets that avoids the process of having to extrapolate the attributes, and instead concentrate on the use of conceptual representations in order to mine the data sets. The idea in using conceptual representations is that even though many attributes may be missing, it is possible to accurately guess the behavior of the data along certain pre-specified directions, i.e., the conceptual directions of the data set.
Abstract:
System and method for generating classification using time sequences comprises inputting a set of time dependant feature variable graphs along with a set of time dependant category variable graphs; finding frequent shapes in the time dependant feature variable graphs; utilizing the frequent shapes to generate combinations of frequent shapes; generating rules relating one or more patterns of combinations of frequent shapes to a category variable; and, performing a categorization utilizing the rules generated.
Abstract:
In accordance with the present invention, a method for selecting a channel and delivery time for digital objects for a broadcast delivery service including multiple channels of varying bandwidths includes the steps of selecting digital objects to be sent over the multiple channels, generating a schedule and pricing for the digital objects based on the digital object selected and existing delivery commitments and manipulating the schedule and pricing to provide a profitable delivery of the digital objects. A system is also included.
Abstract:
A method for scheduling delivery of digital objects over a network, in accordance with the invention, includes the steps of providing a user interface for selecting objects to be transmitted thereto, selecting at least one object to be transmitted to the user interface, identifying and receiving in-progress object transmissions corresponding to the at least one selected object, identifying portions of the at least one object not yet received to request transmission of the portions of the at least one object not yet received and receiving remaining portions of the at least one object during additional in-progress transmissions. A system is also included.
Abstract:
A system and method for generating itemset associations in a memory storage system comprising many transactions, with each transaction including one or more items capable of forming the itemset associations. The method involves generating a lexicographic tree structure having nodes representing itemset associations meeting a minimum support criteria. In a recursive manner, for each lexicographic least itemset (node) P of the lexicographic tree structure, candidate extensions of the node P are first determined. Then, the support of each of the candidate extensions is counted to determine frequent extension itemsets of that node P, while those itemsets not meeting a predetermined support criteria are eliminated. Child nodes corresponding to the frequent extensions and meeting the predetermined support criteria are created. For each frequent child of node P, all itemset associations for all descendants of node P are generated first. Thus, the lexicographic tree structure is generated in a depth first manner. By projecting transactions upon the lexicographic tree structure in a depth-first manner, the CPU time for counting large itemsets is substantially reduced.
Abstract:
A VOD scheduler maintains a queue of pending performance for each video. Using the notion of queue selection factor, a batching policy is devised that schedules the video with the highest selection factor. Selection factors are obtained by applying discriminatory weighting factors to the adjusted queue lengths associated with each video where the weight decreases as the popularity of the respective video increases and the queue length is adjusted to take defection into account.
Abstract:
A system that labels an unlabeled message of a social stream. The system including a memory device storing instructions to execute a training model, the training model being trained based on labeled messages, and partitioned into a plurality of class partitions, each of which comprise statistical information and a class label, and a Central Processing Unit (CPU) that computes a confidence for each of the class partitions based on information of an unlabeled message and the statistical information of a respective class partition, and that labels the unlabeled message according to respective confidences of the class partitions.
Abstract:
Mechanisms are provided for anonymizing data comprising a plurality of graph data sets. The mechanisms receive input data comprising a plurality of graph data sets. Each graph data set comprises data for generating a separate graph from graphs associated with other graph data sets. The mechanisms perform clustering on the graph data sets to generate a plurality of clusters. At least one cluster of the plurality of clusters comprises a plurality of graph data sets. Other clusters in the plurality of clusters comprise one or more graph data sets. The mechanisms also determine, for each cluster in the plurality of clusters, aggregate properties of the cluster. Moreover, the mechanisms generate, for each cluster in the plurality of clusters, pseudo-synthetic data representing the cluster, from the determined aggregate properties of the clusters.