摘要:
Advertising slots on a search engine results page may be determined based on keywords and/or results to a user query. Advertisers may use the keywords and/or the results to the query to place their ads into the advertising slots. Rules may be applied to determine how ads are displayed or not displayed. For example, a larger set of keywords may be inferred from the initial set of keywords on which the ad provider placed her bids. This greatly increases the potential reach of an advertiser's ad campaign or a search engine provider's revenue from ad placement.
摘要:
Techniques for contingency table release provide an accurate and consistent set of tables while guaranteeing that privacy is preserved. A positive and integral database is constructed that corresponds to these tables. Therefore, a database can be generated that preserves low-order marginals up to a small error. Moreover, a gracefully degrading version of the results is provided as a database can be computed such that the error in the low-order marginals is small, and increases smoothly with the order of the marginal.
摘要:
While consulting indexes to conduct a search, a determination is made from time to time as to whether it is more efficient to consult individual indexes in a set or to merge the indexes and consult the merged index. The cost of merging indexes is compared with the cost of individually querying indexes. In accordance with the result of this comparison, the indexes are merged and the merged index is consulted, or the indexes are individually consulted. A cost-balance invariant in the form of an inequality is used to equate the cost of merging indexes to a weighted cost of individually querying indexes. As query events are received, the costs are updated. As long as the cost-balance invariant is not violated, indexes are merged and the merged index is queried. If the cost-balance invariant is violated, indexes are not merged, and the indexes are individually queried.
摘要:
An input or query is determined for which a search engine's static ranking computation is the answer. By understanding how this input or query differs from the posed input or query, the precise termination point of an iterative convergence problem can be determined. An iterative process provides the following inputs to the system: a graph of hyperlinks, and a vector of how the probability mass is redistributed. Given the set of ranks (the output results), it is determined how the input (e.g., the query) would have to be changed to get the rank(s) as the answer or result. Backward answer analysis is provided in the web page context. The difference between what was asked and what should have been asked is determined. After the difference is computed, it is determined if the iterative process should be stopped or not.
摘要:
Methods and systems are described for computing page rankings more efficiently. Using an interconnectivity matrix describing the interconnection of web pages, a new matrix is computed. The new matrix is used to compute the average of values associated with each web page's neighboring web pages. The secondary eigenvector of this new matrix is computed, and indices for web pages are relabeled according to the eigenvector. The data structure storing the interconnectivity information is preferably also physically sorted according to the eigenvector. By reorganizing the matrix used in the web page ranking computations, caching is performed more efficiently, resulting in faster page ranking techniques. Methods for efficiently allocating the distribution of resources are also described.
摘要:
Methods and systems are provided for efficiently computing personalized rankings of web pages or other interconnected objects. The personalized rankings are produced by efficiently computing an approximation matrix to an ideal personalized page ranking matrix. The methods and systems provided herein can be used to produce search results with particular relevance to an individual searcher.
摘要:
Methods and systems for finding a low rank approximation for an m×n matrix A are described. The described embodiments can independently sample and/or quantize the entries of an input matrix A, and can thus speed up computation by reducing the number of non-zero entries and/or their representation length. The embodiments can be used in connection with Singular Value Decomposition techniques to greatly benefit the processing of high-dimensional data sets in terms of storage, transmission and computation.
摘要:
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.
摘要:
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.
摘要:
Systems and methods are provided for controlling privacy loss associated with database participation. In general, privacy loss can be evaluated based on information available to a hypothetical adversary with access to a database under two scenarios: a first scenario in which the database does not contain data about a particular privacy principal, and a second scenario in which the database does contain data about the privacy principal. Such evaluation can be made for example by a mechanism for determining sensitivity of at least one database query output to addition to the database of data associated with a privacy principal. An appropriate noise distribution can be calculated based on the sensitivity measurement and optionally a privacy parameter. A noise value is selected from the distribution and added to query outputs.