摘要:
The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
摘要:
The method trains an inductive model to output multiple models from the inductive model and trains an error correlation model to estimate an average output of predictions made by the multiple models. Then the method can determine an error estimation of each of the multiple models using the error correlation model.
摘要:
Range query techniques are disclosed for use in accordance with data stream processing systems. In one aspect of the invention, a technique is provided for indexing continual range queries for use in data stream processing. For example, a technique for use in processing a data stream comprises obtaining at least one range query to be associated with the data stream, and building a range query index based on the at least one range query using one or more virtual constructs such that the query index is adaptive to one or more changes in a distribution of range query sizes. The step/operation of building the range query index may further comprise building the range query index such that the range query index accommodates one or more changes in query positions outside a monitoring area of the at least one range query. In another aspect of the invention, a technique is provided for incrementally processing continual range queries against moving objects. For example, a technique for evaluating one and more continual range queries over one and more moving objects comprises maintaining a query index with one and more containment-encoded virtual constructs associated with the one and more continual range queries over the one and more moving objects, and incrementally evaluating the one or more continual range queries using the query index.
摘要:
A method which identifies different types of substructures within a graph and encodes them using techniques suitable to the characteristics of each of them. The method is embodied by an efficient two-phase algorithm, where the first phase identifies and encodes strongly connected components as well as tree substructures, and the second phase encodes the remaining reachability relationships by compressing dense rectangular submatrices in the transitive closure matrix.
摘要:
Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
摘要:
A method (and system) of assigning a sales opportunity, includes creating an assignment model based on clustering historical sales opportunities, and providing a scoring mechanism on a plurality of sales agents for automatically optimizing an assignment of at least one sales opportunity to at least one of the plurality of sales agents.
摘要:
Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addresed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures. For any given XML dataset, the principle finds an optimal sequencing strategy according to its schema and its data distribution; there is thus presented herein a novel method that realizes this principle.
摘要:
An improved universal remote control unit (URC) for controlling electronic appliance units. The URC unit has the typical remote controller module for controlling appliances such as TV, stereo, VCR or DVD. Additionally, the URC has a scratch pad memory for storing telephone numbers and web site information entered through the URC unit's alphanumeric keys. When activated, the key pad entries are stored in the memory, instead of being used to control the appliance. The URC unit further has a digital recorder module that can be implemented with a microphone, a voice recorder chip and a speaker, all integrated with the URC unit. The digital recorder module can even use the battery that is typically used by the URC unit. The URC unit further has a display screen to display the information stored in and recalled from the memory.
摘要:
Techniques are provided for performing structural joins for answering containment queries. Such inventive techniques may be used to perform efficient structural joins of two interval lists which are neither sorted nor pre-indexed. For example, in an illustrative aspect of the invention, a technique for performing structural joins of two element sets of a tree-structured document, wherein one of the two element sets is an ancestor element set and the other of the two element sets is a descendant element set, and further wherein each element is represented as an interval representing a start position and an end position of the element in the document, comprises the following steps/operations. An index is dynamically built for the ancestor element set. Then, one or more structural joins are performed by searching the index with the interval start position of each element in the descendant element set.
摘要:
Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.