摘要:
Distribution of content between publishers and consumers is accomplished using an overlay network that may make use of XML language to facilitate content identification. The overlay network includes a plurality of routers that may be in communication with each other and the publishers and consumers on the Internet. Content and queries are identified by content descriptors that are routed from the originator to a nearest router in the overlay network. The nearest router, for each unique content descriptor, generates a hash identification of the content descriptor which is used by remaining routers in the overlay network to provide the appropriate functions with respect to the content descriptor. In particular, this allows all routers in the overlay network except the nearest router to properly route content without processing every content descriptor.
摘要:
Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony.
摘要:
Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided.
摘要:
The present invention discloses the use of generalized queries, referred to as query templates, obtained by generalizing individual user queries, as the semantic basis for low overhead, high benefit directory caches for handling declarative queries. Caching effectiveness can be improved by maintaining a set of generalizations of queries and admitting such generalizations into the cache when their estimated benefits are sufficiently high. In a preferred embodiment of the invention, the admission of query templates into the cache can be done in what is referred to by the inventors as a “revolutionary” fashion—followed by stable periods where cache admission and replacement can be done incrementally in an evolutionary fashion. The present invention can lead to considerably higher hit rates and lower server-side execution and communication costs than conventional caching of directory queries—while keeping the clientside computational overheads comparable to query caching.
摘要:
A method of providing content comprises making the content available on a central server, and surveying a plurality of peers for a portion of the content. The portion of the content from one of the peers is obtained when the portion of the content is available from the one of the peers, and obtained from the central server when the portion of the content is not available from the plurality of peers.
摘要:
A method and system for monitoring traffic in a data communication network and for extracting useful statistics and information is disclosed. In accordance with an embodiment of the invention, a network interface card has a run-time system and one or more processing blocks executing on the network interface. The run-time system module feeds information derived from a network packet to the processing modules which process the information and generate output such as condensed statistics about the packets traveling through the network.
摘要:
Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided.
摘要:
Approximate substring indexing is accomplished by decomposing each string in a database into overlapping “positional q-grams”, sequences of a predetermined length q, and containing information regarding the “position” of each q-gram within the string (i.e., 1st q-gram, 4th q-gram, etc.). An index is then formed of the tuples of the positional q-gram data (such as, for example, a B-tree index or a hash index). Each query applied to the database is similarly parsed into a plurality of positional q-grams (of the same length), and a candidate set of matches is found. Position-directed filtering is used to remove the candidates which have the q-grams in the wrong order and/or too far apart to form a “verified” output of matching candidates. If errors are permitted (defined in terms of an edit distance between each candidate and the query), an edit distance calculation can then be performed to produce the final set of matching strings.
摘要翻译:通过将数据库中的每个字符串分解为重叠的“位置q-gram”,预定长度q的序列,并且包含关于字符串中每个q-gram的“位置”的信息(即,1 st sup> q-gram,4 nd q-gram等)。 然后由位置q-gram数据(例如,B树索引或散列索引)的元组形成索引。 应用于数据库的每个查询也被类似地解析为多个位置q-gram(相同长度),并且找到候选的匹配集合。 位置定向滤波用于去除具有错误顺序的q-gram和/或相距太远的候选,以形成匹配候选的“验证”输出。 如果允许错误(根据每个候选者和查询之间的编辑距离定义),则可以执行编辑距离计算以产生最终匹配的字符串。
摘要:
A method and apparatus for using tag topology for enhancing search capabilities, e.g., searching over the web, are disclosed. For example, the present method receives a user query contain a search term from a user. The method then generates a search result containing at least one entity, wherein the at least one entity is found based on a plurality of user provided tags that is associated with the at least one entity.
摘要:
A method of adaptively evaluating a top-k query involves (1204) forming a servers having respective server queues storing candidate answers, processing (1322) the candidate answers, and (1232) providing a top-k set as a query evaluation. Processing includes (1402) adaptively choosing a winning server to whose queue a current candidate answer should be sent; (1404) sending the current candidate answer to the winning server's queue; (1334) adaptively choosing a next candidate answer to process from the winning server's queue; (1336) computing a join between the current candidate answer and next candidate answers at the winning server, so as to produce a new current candidate answer; and (1338) updating the top-k set with the new current candidate answer only if a score of the new current candidate answer exceeds a score of a top-k answer in a top-k set. A method of calculating scores for candidate answers is also provided.