摘要:
A system and method for rights protection of a dataset that includes multiple trajectory objects includes determining an intensity power for embedding a watermarking key in a data trajectory. The data trajectory is modified to embed a watermarking key at the intensity power such that the intensity power guarantees an original pair-wise relationship between distance-based neighboring objects before and after embedding of the key such that a modified trajectory provides a watermarked version of the data trajectory.
摘要:
There are provided a method and a system for computation of optimal distance bounds on compressed time-series data. In a method for similarity search, the method includes the step of transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data. The method further includes the step of computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint. The first constraint is that a sum of squares of the omitted coefficients is less than a sum of the energy of the omitted coefficients. The second constraint is that the energy of the omitted coefficients is less than the energy of a lowest energy one of the top-k coefficients.
摘要:
A system and method for rights protection of a dataset that includes multiple trajectory objects includes determining an intensity power for embedding a watermarking key in a data trajectory. The data trajectory is modified to embed a watermarking key at the intensity power such that the intensity power guarantees an original pair-wise relationship between distance-based neighboring objects before and after embedding of the key such that a modified trajectory provides a watermarked version of the data trajectory.
摘要:
There are provided a method and a system for computation of optimal distance bounds on compressed time-series data. In a method for similarity search, the method includes the step of transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data. The method further includes the step of computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint. The first constraint is that a sum of squares of the omitted coefficients is less than a sum of the energy of the omitted coefficients. The second constraint is that the energy of the omitted coefficients is less than the energy of a lowest energy one of the top-k coefficients.
摘要:
Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
摘要:
Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
摘要:
Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets
摘要:
Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.
摘要:
Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
摘要:
Techniques for perturbing an evolving data stream are provided. The evolving data stream is received. An online linear transformation is applied to received values of the evolving data stream generating a plurality of transform coefficients. A plurality of significant transform coefficients are selected from the plurality of transform coefficients. Noise is embedded into each of the plurality of significant transform coefficients, thereby perturbing the evolving data stream. A total noise variance does not exceed a defined noise variance threshold.