摘要:
The invention comprises a method and apparatus for determining a rank of a query value. Specifically, the method comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound rank value, wherein the predicted lower-bound rank value and upper-bound rank value are determined according to at least one respective prediction model used by each of the at least one remote monitor to compute the at least one local quantile summary, computing a predicted average rank value for each of the at least one remote monitor using the at least one predicted lower-bound rank value and the at least one predicted upper-bound rank value associated with the respective at least one remote monitor, and computing the rank of the query value using the at least one predicted average rank value associated with the respective at least one remote monitor.
摘要:
The first fast solution to the problem of tracking wavelet representations of one-dimensional and multi-dimensional data streams based on a stream synopsis, the Group-Count Sketch (GCS) is provided. By imposing a hierarchical structure of groups over the data and applying the GCS, our algorithms can quickly recover the most important wavelet coefficients with guaranteed accuracy. A tradeoff between query time and update time is established, by varying the hierarchical structure of groups, allowing the right balance to be found for specific data streams. Experimental analysis confirmed this tradeoff, and showed that all the methods significantly outperformed previously known methods in terms of both update time and query time, while maintaining a high level of accuracy.
摘要:
A method of distributed approximate query tracking relies on tracking general-purpose randomized sketch summaries of local streams at remote sites along with concise prediction models of local site behavior in order to produce highly communication-efficient and space/time-efficient solutions. A powerful approximate query tracking framework readily incorporates several complex analysis queries, including distributed join and multi-join aggregates and approximate wavelet representations, thus giving the first known low-overhead tracking solution for such queries in the distributed-streams model.
摘要:
Methods and apparatus are disclosed to anonymize a dataset of spatial data. An example method includes generating a spatial indexing structure with spatial data, establishing a height value associated with the spatial indexing structure to generate a plurality of tree nodes, each of the plurality of tree nodes associated with spatial data counts, calculating a localized noise budget value for respective ones of the tree nodes based on the height value and an overall noise budget, and anonymizing the plurality of tree nodes with a anonymization process, the anonymization process using the localized noise budget value for respective ones of the tree nodes.
摘要:
A system, method, and computer program product for distributed monitoring of local thresholds at each of a number of monitoring nodes and initiating communication only after the locally observed data exceeds the local threshold. Both static thresholds and adaptive thresholds are considered. In the static case, a combination of two alternate strategies for considering thresholds minimizes communication overhead. In the adaptive case, local thresholds are adjusted based on the observed distributions of updated information in the distributed monitoring system. Both approaches yield significant savings over the naïve approach of performing processing at a centralized location.
摘要:
Methods and apparatus are disclosed to anonymize a dataset of spatial data. An example method includes generating a spatial indexing structure with spatial data, establishing a height value associated with the spatial indexing structure to generate a plurality of tree nodes, each of the plurality of tree nodes associated with spatial data counts, calculating a localized noise budget value for respective ones of the tree nodes based on the height value and an overall noise budget, and anonymizing the plurality of tree nodes with a anonymization process, the anonymization process using the localized noise budget value for respective ones of the tree nodes.
摘要:
A system, method, and computer program product for distributed monitoring of local thresholds at each of a number of monitoring nodes and initiating communication only after the locally observed data exceeds the local threshold. Both static thresholds and adaptive thresholds are considered. In the static case, a combination of two alternate strategies for considering thresholds minimizes communication overhead. In the adaptive case, local thresholds are adjusted based on the observed distributions of updated information in the distributed monitoring system. Both approaches yield significant savings over the naïve approach of performing processing at a centralized location.