摘要:
A data exploration tool which has a graphical user interface that employs directed graphs to provide histories of the data exploration operations. Nodes in the directed graphs represent operations on data; the edges represent relationships between the operations. One type of the directed graphs is the derivation graph, in which the root of the graph is a node representing a data set and an edge leading from a first node to a second node indicates that the operation represented by the second node is performed on the result of the operation represented by the first node. Operations include query, segmentation, aggregation, and data view operations. A user may edit the derivation graph and may select a node for execution. When that is done, all of the operations represented by the nodes between the root node and the selected node are performed as indicated in the graph. The operations are performed using techniques of lazy evaluation and encachement of results with the nodes. Another type of the directed graphs is the subsumption graph, in which an edge leading from a first node to a second node indicates that the second node stands in a subsumption relationship to the first node. If a result of the operation represented by the first node has been computed, the result is available to calculate the result of the operation represented by the second node.
摘要:
Recommendation systems are widely used in Internet applications. In current recommendation systems, users only play a passive role and have limited control over the recommendation generation process. As a result, there is often considerable mismatch between the recommendations made by these systems and the actual user interests, which are fine-grained and constantly evolving. With a user-powered distributed recommendation architecture, individual users can flexibly define fine-grained communities of interest in a declarative fashion and obtain recommendations accurately tailored to their interests by aggregating opinions of users in such communities. By combining a progressive sampling technique with data perturbation methods, the recommendation system is both scalable and privacy-preserving.
摘要:
The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X→G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency.
摘要:
An online data fusion system receives a query, probes a first source for an answer to the query, returns the answer from the first source, refreshes the answer while probing an additional source, and applies fusion techniques on data associated with an answer that is retrieved from the additional source. For each retrieved answer, the online data fusion system computes the probability that the answer is correct and stops retrieving data for the answer after gaining enough confidence that data retrieved from the unprocessed sources are unlikely to change the answer. The online data fusion system returns correct answers and terminates probing additional sources in an expeditious manner without sacrificing the quality of the answers.
摘要:
Methods and systems are described to store state used to forward multicast traffic. The system includes a receiving module to receive request to add a first node to a membership tree. The membership tree includes a first plurality of nodes associated with a multicast group. The system further includes a processing module to identify a second node in the first plurality of nodes and to communicate a node identifier that identifies the first node over a network to the second node. The node identifier is to be stored at the second node to add the first node to the membership tree. The node identifier is further to be stored in the membership tree exclusively at the second node to enable the second node to forward the multicast traffic to the first node.
摘要:
The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well.
摘要:
A distributed transformation network provides delivery of content from a content publisher to a content recipient. Content from the content publisher is received at an entry node of the distributed transformation network and transmitted to a transformation node in the distributed transformation network. The content is transformed according to publisher, recipient or network administrator specifications and transmitting to delivery nodes which deliver the transformed content to the content recipient. The published content may be in an XML-based format and transformed into an XML-related format or any other structured language format as desired in the provided specification.
摘要:
Described is a system and method for receiving a data stream of multi-dimensional items, collecting a sample of the data stream having a predetermined number of items and dividing the sample into a plurality of subsamples, each subsample corresponding to a single dimension of each of the predetermined number of items. A query is then executed on a particular item in at least two of the subsamples to generate data for the corresponding subsample. This data is combined into a single value.
摘要:
A disclosed method for implementing time decay in the analysis of streaming data objects is based on the age, referred to herein as the forward age, of a data object measured from a landmark time in the past to a time associated with the occurrence of the data object, e.g., an object's timestamp. A forward time decay function is parameterized on the forward age. Because a data object's forward age does not depend on the current time, a value of the forward time decay function is determined just once for each data object. A scaling factor or weight associated with a data object may be weighted according to its decay function value. Forward time decay functions are beneficial in determining decayed aggregates, including decayed counts, sums, and averages, decayed minimums and maximums, and for drawing decay-influenced samples.
摘要:
The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well.