摘要:
A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.
摘要:
A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.
摘要:
Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.
摘要:
Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
摘要:
One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.
摘要:
A method for implementing a multi-stage, multi-classification sales opportunity modeling system. The method includes receiving operational data relating to past sales activities and receiving parameters identified as being relevant in determining a likelihood of whether exploitation of a sales opportunity will be successful. The method also includes generating a multi-stage model by applying the operational data and the parameters to an analytic engine for evaluating different factors affecting success of the sales opportunity.
摘要:
Techniques for graph indexing are provided. In one aspect, a method for indexing graphs in a database, the graphs comprising graphic data, comprises the following steps. Frequent subgraphs among one or more of the graphs in the database are identified, the frequent subgraphs appearing in at least a threshold number of the graphs in the database. One or more of the frequent subgraphs are used to create an index of the graphs in the database.
摘要:
Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.
摘要:
A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
摘要:
A method for distributing and utilizing software is provided. In the method of distribution, a software application is provided on a hardware device by a manufacturer of the software application, wherein the software application is executable on the hardware device. The hardware device is enclosed within a box and distributed. The manufacturer provides continued services for the software application, wherein the hardware device is connectable between at least one end user's computer and the manufacturer. The hardware device is adapted to provide the continued services via a communication link between the hardware device and the manufacturer.