Abstract:
Mechanisms are provided for transforming an original graph data set into a representative form having a smaller number of dimensions that the original graph data set. The mechanisms generate a graph transformation basis structure based on an input graph data structure. The mechanisms further transform an original graph data set based on an intersection of the graph transformation basis structure and the input graph data structure to thereby generate a transformed graph data set data structure. The transformed graph data set data structure has a reduced dimensionality from that of the input graph data structure but represents characteristics of the original graph data set. Moreover, the mechanisms perform an application specific operation on the transformed graph data set data structure to generate an output of a closest similarity record in the transformed graph data set to a target component.
Abstract:
Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.
Abstract:
An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.
Abstract:
Techniques for determining a shortest path in a disk-based network are provided. The techniques include creating a compressed representation of an underlying disk resident network graph, wherein creating a compressed representation of an underlying disk resident network graph comprises determining one or more dense regions in the disk resident graph and compacting the one or more dense regions into one or more compressed nodes, associating one or more node penalties with the one or more compressed nodes, wherein the one or more node penalties reflect a distance of a sub-path within a compressed node, and performing a query on the underlying disk resident network graph using the compressed representation and one or more node penalties to determine a shortest path in the disk-based network to reduce the number of accesses to a physical disk.
Abstract:
Techniques for optimizing steady state flow of a network are provided. The techniques include determining a first set of two or more nodes in a network, computing a steady-state flow probability of the first set of two or more nodes, and iteratively interchanging nodes from a second set of two or more nodes into the first set of two or more nodes to determine an optimum total steady state flow of the network, wherein determining an optimum total steady-state flow of the network comprises iteratively interchanging nodes until no additional improvements in steady-state flow over the computed steady-state flow probability can be obtained.
Abstract:
Mechanisms are provided for transforming an original graph data set into a representative form having a smaller number of dimensions that the original graph data set. The mechanisms generate a graph transformation basis structure based on an input graph data structure. The mechanisms further transform an original graph data set based on an intersection of the graph transformation basis structure and the input graph data structure to thereby generate a transformed graph data set data structure. The transformed graph data set data structure has a reduced dimensionality from that of the input graph data structure but represents characteristics of the original graph data set. Moreover, the mechanisms perform an application specific operation on the transformed graph data set data structure to generate an output of a closest similarity record in the transformed graph data set to a target component.
Abstract:
Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.
Abstract:
A technique for processing a data stream includes the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure maybe used for classification of data in the data stream.
Abstract:
Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.
Abstract:
A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.