摘要:
The present invention discloses a document descriptor extraction method and system. The document descriptor extraction method and system creates a document descriptor by generalizing input sequences within a document; factoring the input sequences and generalized input sequences; and selecting a document descriptor from the input sequences, generalized sequences, and factored sequences, preferably using minimum descriptor length (MDL) principles. Novel algorithms are employed to perform the generalizing, factoring, and selecting.
摘要:
The present invention provides a method and system for sequential pattern mining with a given constraint. A Regular Expression (RE) is used for identifying the family of interesting frequent patterns. A family of methods that enforce the RE constraint to different degrees within the generating and pruning of candidate patterns during the mining process is utilized. This is accomplished by employing different relaxations of the RE constraint in the mining loop. Those sequences which satisfy the given constraint are thus identified most expeditiously.
摘要:
A method for querying electronic data. The query method comprises creating wavelet-coefficient synopses of the electronic data and then querying the synopses in the wavelet-coefficient domain to obtain a wavelet-coefficient query result. The wavelet-coefficient query result is then rendered to provide an approximate result.
摘要:
Physical connectivity is determined between elements such as switches and routers in a multiple subnet communication network. Each element has one or more interfaces each of which is physically linked with an interface of another network element. Address sets are generated for each interface of the network elements, wherein members of a given address set correspond to network elements that can be reached from the corresponding interface for which the given address set was generated. The members of first address sets generated for corresponding interfaces of a given network element, are compared with the members of second address sets generated for corresponding interfaces of network elements other than the given element. A set of candidate connections between an interface of the given network element and one or more interfaces of other network elements, are determined. If more than one candidate connection is determined, connections with network elements that are in the same subnet as the given network element are eliminated from the set.
摘要:
A method and system for answering set-expression cardinality queries while lowering data communication costs by utilizing a coordinator site to provide global knowledge of the distribution of certain frequently occurring stream elements to significantly reduce the transmission of element state information to the central site and, optionally, capturing the semantics of the input set expression in a Boolean logic formula and using models of the formula to determine whether an element state change at a remote site can affect the set expression result.
摘要:
The invention comprises a method and apparatus for determining a rank of a query value. Specifically, the method comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound rank value, wherein the predicted lower-bound rank value and upper-bound rank value are determined according to at least one respective prediction model used by each of the at least one remote monitor to compute the at least one local quantile summary, computing a predicted average rank value for each of the at least one remote monitor using the at least one predicted lower-bound rank value and the at least one predicted upper-bound rank value associated with the respective at least one remote monitor, and computing the rank of the query value using the at least one predicted average rank value associated with the respective at least one remote monitor.
摘要:
The invention provides methods and systems for summarizing multiple continuous update streams such that an approximate answer to a query over one or more of the continuous update streams (such as a Query requiring a join operation followed by a duplicate elimination step) may be rapidly provided. The systems and methods use multiple (parallel) Join Distinct (JD) Sketch data structures corresponding to hash buckets of at least one initial attribute.
摘要:
A system for, and method of, determining a physical topology of a network having multiple subnets. In one embodiment, the system includes: (1) a skeleton path initializer that uses addressing information from elements in the network to develop a collection of skeleton paths of direct physical connections between labeled ones of the elements, the skeleton paths traversing multiple of the subnets and (2) a skeleton path refiner, coupled to the skeleton path initializer, that refines the collection by inferring, from the direct physical connections and path constraints derived therefrom, other physical connections in the skeleton paths involving unlabeled ones of the elements.
摘要:
Method for performing information-preserving DTD schema embeddings between a source schema when matching a source schema and a target schema. The preservation is realized by a matching process between the two schemas that finds a first string marking of the target schema, evaluates a legality of the first string marking, determines an estimated minimal cost of the first string marking and subsequently adjusts the estimated minimal cost based upon one to one mapping of source schema and target schema subcomponents.
摘要:
A system for, and method of compressing a data table using models and a database management system incorporating the system or the method. In one embodiment, the system includes: (1) a table modeller that discovers data mining models with guaranteed error bounds of at least one attribute in the data table in terms of other attributes in the data table and (2) a model selector, associated with the table modeller, that selects a subset of the at least one model to form a basis upon which to compress the data table.