摘要:
A method of labeling unlabeled nodes in a graph that represents objects that have an explicit structure between them. A computing device can use a labeling engine to labeled nodes in a graph that are labeled and can identify an unlabeled node in the graph that is structurally associated with the labeled nodes. The labeling engine can label the unlabeled node with the label of the labeled node based on the structural association between the unlabeled node and the labeled node.
摘要:
The present disclosure is directed to systems, methods, and computer-readable storage media for anonymizing data over multiple temporal releases. Data is received, and nodes and connections in the data are identified. The data also is analyzed to identify predicted connections. The nodes, the connections, and the predicted connections are analyzed to determine how to group the nodes in the data. The data is published, and the grouping of the nodes is extended to subsequent temporal releases of the data, the nodes of which are grouped in accordance with the grouping used with the data.
摘要:
The present disclosure is directed to systems, methods, and computer-readable storage media for anonymizing data over multiple temporal releases. Data is received, and nodes and connections in the data are identified. The data also is analyzed to identify predicted connections. The nodes, the connections, and the predicted connections are analyzed to determine how to group the nodes in the data. The data is published, and the grouping of the nodes is extended to subsequent temporal releases of the data, the nodes of which are grouped in accordance with the grouping used with the data.
摘要:
The present disclosure is directed to systems, methods, and computer-readable storage media for sampling from distributed data streams. Data elements are received at site servers configured to collect and report data to a coordinator device. The site servers assign a binary string to each of the data elements. Each bit of the binary strings can be independently set to a 0 or a 1 with a probability of one half. The binary string is used to sample from the received data elements, and the data elements and/or the sampled data elements can be transmitted to a coordinator device. The coordinator device can examine one or more bits of the binary string to draw samples of the received data elements in accordance with desired probabilities.
摘要:
Disclosed are method and apparatus for identifying members of a social network who have a high likelihood of providing a useful response to a query. A query engine examines the personal pages of a set of members and automatically gleans semantic information relevant to the query. From the automatically-gleaned semantic information, a score indicative of the likelihood that the member may provide a useful response is calculated.
摘要:
Described is a system and method for receiving a data stream of multi-dimensional items, collecting a sample of the data stream having a predetermined number of items and dividing the sample into a plurality of subsamples, each subsample corresponding to a single dimension of each of the predetermined number of items. A query is then executed on a particular item in at least two of the subsamples to generate data for the corresponding subsample. This data is combined into a single value.
摘要:
Methods and apparatus for representing probabilistic data using a probabilistic histogram are disclosed. An example method comprises partitioning a plurality of ordered data items into a plurality of buckets, each of the data items capable of having a data value from a plurality of possible data values with a probability characterized by a respective individual probability distribution function (PDF), each bucket associated with a respective subset of the ordered data items bounded by a respective beginning data item and a respective ending data item, and determining a first representative PDF for a first bucket associated with a first subset of the ordered data items by partitioning the plurality of possible data values into a first plurality of representative data ranges and respective representative probabilities based on an error between the first representative PDF and a first plurality of individual PDFs characterizing the first subset of the ordered data items.
摘要:
A disclosed method for implementing time decay in the analysis of streaming data objects is based on the age, referred to herein as the forward age, of a data object measured from a landmark time in the past to a time associated with the occurrence of the data object, e.g., an object's timestamp. A forward time decay function is parameterized on the forward age. Because a data object's forward age does not depend on the current time, a value of the forward time decay function is determined just once for each data object. A scaling factor or weight associated with a data object may be weighted according to its decay function value. Forward time decay functions are beneficial in determining decayed aggregates, including decayed counts, sums, and averages, decayed minimums and maximums, and for drawing decay-influenced samples.
摘要:
Described is a system and method for receiving a signal for transmission and encoding the signal into a plurality of linear projections representing the signal. The encoding includes defining a transform matrix. The transform matrix being defined by processing the signal using a macroseparation matrix, processing the signal using a microseparation matrix and processing the signal using an estimation vector.
摘要:
A method and apparatus for tracking communications in a network are disclosed. For example, the method receives a subscription from a customer for a service to track at least one variable associated with a plurality of communicants of the customer. The method identifies a plurality of members of a social network of the customer, and gathers communication data associated with the plurality of members for tracking the at least one variable. The method then displays at least one result derived from the communication data to the customer.