摘要:
Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.
摘要:
A system, method and computer-readable medium are disclosed for identifying representative data using sketches. The method embodiment comprises generating a plurality of vectors from a data set, modifying each of the vectors of the plurality of vectors and selecting one of the plurality of generated vectors according to a comparison of a summed distance between a modified vector associated with the selected generated vector and remaining modified vectors. Modifying the generated vectors may involve reduced each generated vector to a lower dimensional vector. The summed distance then represents a summed distance between the lower dimensional vector and remaining lower dimensional vectors.
摘要:
A system, method and computer-readable medium are disclosed for identifying representative data using sketches. The method embodiment comprises generating a plurality of vectors from a data set, modifying each of the vectors of the plurality of vectors and selecting one of the plurality of generated vectors according to a comparison of a summed distance between a modified vector associated with the selected generated vector and remaining modified vectors. Modifying the generated vectors may involve reduced each generated vector to a lower dimensional vector. The summed distance then represents a summed distance between the lower dimensional vector and remaining lower dimensional vectors.
摘要:
A method and system for identifying representative data trends using sketches. A sketch is a lower dimensional vector used to represent higher dimensional data. The properties of sketches include data dimensionality reduction, sketches synthesized from other sketches, and the distance between sketches comparable to the distance between the data the sketches represent. Exemplary embodiments include identifying relaxed periods and average trends.
摘要:
An approach for multidimensional substring selectivity estimation utilizes set hashing to generate cross-counts as needed, instead of storing cross-counts for the most frequently co-occurring substrings. Set hashing is a Monte Carlo technique that is used to succinctly represent the set of tuples containing a given substring. Then, any combination of set hashes will yield a cross-count when intersected. Thus, the set hashing technique is useful in three-, four- and other multidimensional situations, since only an intersection function is required.