摘要:
One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.
摘要:
Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
摘要:
Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
摘要:
The present invention provides a method for controlling an entity's access to a resource based on observed behavior of the entity. The method assigns the entity a default authorization meta-tag. The method monitors the entity's behavior and updates the entity's meta-tag based upon the observed behavior. Accordingly, dynamic behavior-based access control is achieved.
摘要:
The present invention derives product characterizations for products offered at an e-commerce site based on the text descriptions of the products provided at the site. A customer characterization is generated for any customer browsing the e-commerce site. The characterizations include an aggregation of derived product characterizations associated with products bought and/or browsed by that customer. A peer group is formed by clustering customers having similar customer characterizations. Recommendations are then made to a customer based on the processed characterization and peer group data.
摘要:
A method of distributing a set of data among a plurality of disks, which provides for load balancing in the event of a disk failure. In accordance with the method the total number of the disks in an array are divided into a number of clusters. The blocks of data are then stored in each cluster such that each cluster contains a complete set of the data and such that data block placement in each cluster is a unique permutation of the data block placement in the other clusters. In the event of a disk failure, data block accesses to the failed disk are redirected to a disk in the other cluster having a copy of the data block and further access to the disks that remain operational are rebalanced.
摘要:
A system and method for performing variable speed scanning or browsing, wherein a user controls the playout speed of a movie, which does not require additional disk or network bandwidth resources. In a preferred embodiment, the method provides for scanning operations for an Motion Picture Experts Group (MPEG) video stream. The method satisfies the constraints of the MPEG decoder (in the users set-top box) and require a minimum of additional system resources. The embodiments of the present invention include (a) a storage method, (b1) a segment sampling method, (b2) a segment placement method, and (c) a playout method, where (b1) and (b2) are two alternatives for segment selection. Thus, two sets of solutions are provided to support variable speed scanning in a disk-array-based video server: One using (a), (b1) and (c), and the other using (a), (b2) and (c).
摘要:
Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence. As the level of confidence is raised, a sub-ensemble that has more models will be selected and as the level of confidence is lowered, a sub-ensemble that has fewer models will be selected. Finally, the invention applies the selected sub-ensemble, in place of the ensemble, to an example to make a prediction.
摘要:
An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.
摘要:
Uncertain data is classified by constructing an error adjusted probability density estimate for the data, and applying a subspace exploration process to the probability density estimate to classify the data.