摘要:
The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints.
摘要:
Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.
摘要:
Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.
摘要:
The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints.
摘要:
A method and system for performing a data stream query. A data stream query requiring a join operation on multiple data streams is approximated without performing the join operation. It is determined whether conditions of the query are proper to accurately approximate the join operation, and if the conditions are proper the join operation is approximated. The join operation is approximated by independently aggregating values of the data streams and comparing the independently aggregated values.
摘要:
A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data. There is therefore provided a method for accessing data over a wide area network comprising: providing a decentralized architecture comprising a plurality of data nodes each having a database, a query processor and a path index, and a plurality of router nodes each having a routing state, maintaining a routing state in each of the router nodes, broadcasting routing state updates from each of the databases to the router nodes, routing path queries to each of the databases by accessing the routing state.
摘要:
An organization's data records are often noisy: because of transcription errors, incomplete information, and lack of standard formats for textual data. A fundamental task during data cleansing and integration is matching strings—perhaps across multiple relations—that refer to the same entity (e.g., organization name or address). Furthermore, it is desirable to perform this matching within an RDBMS, which is where the data is likely to reside. In this paper, We adapt the widely used and established cosine similarity metric from the information retrieval field to the relational database context in order to identify potential string matches across relations. We then use this similarity metric to characterize this key aspect of data cleansing and integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose an approximate, sampling-based approach to the join problem that can be easily and efficiently executed in a standard, unmodified RDBMS. Therefore the present invention includes a system for string matching across multiple relations in a relational database management system comprising generating a set of strings from a set of characters, decomposing each string into a subset of tokens, establishing at least two relations within the strings, establishing a similarity threshold for the relations, sampling the at least two relations, correlating the relations for the similarity threshold and returning all of the tokens which meet the criteria of the similarity threshold.
摘要:
A system, method and computer-readable medium are disclosed for identifying representative data using sketches. The method embodiment comprises generating a plurality of vectors from a data set, modifying each of the vectors of the plurality of vectors and selecting one of the plurality of generated vectors according to a comparison of a summed distance between a modified vector associated with the selected generated vector and remaining modified vectors. Modifying the generated vectors may involve reduced each generated vector to a lower dimensional vector. The summed distance then represents a summed distance between the lower dimensional vector and remaining lower dimensional vectors.
摘要:
Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided.
摘要:
A system, method and computer-readable medium are disclosed for identifying representative data using sketches. The method embodiment comprises generating a plurality of vectors from a data set, modifying each of the vectors of the plurality of vectors and selecting one of the plurality of generated vectors according to a comparison of a summed distance between a modified vector associated with the selected generated vector and remaining modified vectors. Modifying the generated vectors may involve reduced each generated vector to a lower dimensional vector. The summed distance then represents a summed distance between the lower dimensional vector and remaining lower dimensional vectors.