摘要:
A policy enabled caching system based upon policy rules which define whether a request from a client is directed to a cache or a server. The client is coupled to a plurality of caches and to at least one server. The caches may store a subset of the data stored on the server. The policy enabled caching system stores policy rules which comprise at least one matching condition, where every request containing a matching condition falls into an associated class. Each class will have an associated routing rule, where a routing rule defines the type of routing for all the requests which fall into that class. The policy enabled caching system will receive the request from the client and classify the request according to the policy rules. The request is then routed according to the routing rule associated with the class to which the request belongs.
摘要:
Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
摘要:
Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
摘要:
Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets
摘要:
Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.
摘要:
Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
摘要:
A system and method for rights protection of a dataset that includes multiple trajectory objects includes determining an intensity power for embedding a watermarking key in a data trajectory. The data trajectory is modified to embed a watermarking key at the intensity power such that the intensity power guarantees an original pair-wise relationship between distance-based neighboring objects before and after embedding of the key such that a modified trajectory provides a watermarked version of the data trajectory.
摘要:
Techniques for perturbing an evolving data stream are provided. The evolving data stream is received. An online linear transformation is applied to received values of the evolving data stream generating a plurality of transform coefficients. A plurality of significant transform coefficients are selected from the plurality of transform coefficients. Noise is embedded into each of the plurality of significant transform coefficients, thereby perturbing the evolving data stream. A total noise variance does not exceed a defined noise variance threshold.
摘要:
The present invention provides an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure in which each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence because each event is associated with a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed herein enables the efficient retrieval from the database of all subsequences (contiguous and non-contiguous) that match a given query sequence both by events and by weights. The index structure also takes into consideration the nonuniform frequency distribution of events in the sequence data.
摘要:
A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.