摘要:
A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.
摘要:
The present invention is directed to a client-server network system implementing a multi-tier caching strategy for a user to access a document efficiently. The system comprises a client cache assistant serving as proxy for web browsers, a remote cache server managing user-requested documents and a search engine repository storing a huge number of documents as a backup for the remote cache server. Upon receipt of a document request, the client cache assistant examines its client cache to identify the requested document. If not successful, the remote cache server then identifies a copy of the requested document in its own cache and transmits a content difference between the two copies to the client cache assistant. If the server copy is still not fresh or not found, the remote cache server seeks another copy of the requested document from the search engine repository and transmits another content difference to the client cache assistant. The client cache assistant merges the content differences and the original copy into a new copy of the requested document.
摘要:
An example method for passive compaction of a cache includes determining first metadata associated with first data and second metadata associated with second data. The first metadata includes a first retrieval time, and the second metadata includes a second retrieval time. The example method further includes obtaining a first metadata key including a first unique identifier and obtaining a second metadata key including a second unique identifier. The example method also includes generating a first data key and generating a second data key. The example method further includes writing, at a client device, the first and second data to the cache. Each of the first and second data occupy one or more contiguous blocks of physical memory in the cache, and the first and second data are stored in the cache in an order based on the relative values of the first and second retrieval times.
摘要:
A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.
摘要:
Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.
摘要:
Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.
摘要:
In one aspect, the present disclosure can be embodied in a method that includes identifying a collection of entities from one or more data sources, calculating a score for subsets of entities from the collection based on one or more seed entities associated with the collection, identifying one or more entities from each of the subsets based on the calculated score, assigning the calculated score to the identified one or more entities from the respective subset, and ranking the one or more entities based on the assigned score, so as to identify entities in the collection that are related to the one or more seed entities.
摘要:
An example method for passive compaction of a cache includes determining first metadata associated with first data and second metadata associated with second data. The first metadata includes a first retrieval time, and the second metadata includes a second retrieval time. The example method further includes obtaining a first metadata key including a first unique identifier and obtaining a second metadata key including a second unique identifier. The example method also includes generating a first data key and generating a second data key. The example method further includes writing, at a client device, the first and second data to the cache. Each of the first and second data occupy one or more contiguous blocks of physical memory in the cache, and the first and second data are stored in the cache in an order based on the relative values of the first and second retrieval times.
摘要:
Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host.
摘要:
A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.