-
公开(公告)号:US10210256B2
公开(公告)日:2019-02-19
申请号:US15088670
申请日:2016-04-01
Applicant: Google Inc.
Inventor: Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya
Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
-
公开(公告)号:US10216847B2
公开(公告)日:2019-02-26
申请号:US15617634
申请日:2017-06-08
Applicant: Google Inc.
Inventor: Huican Zhu , Anurag Acharya , Max Ibel , Howard B. Gobioff
Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
-
公开(公告)号:US09679056B2
公开(公告)日:2017-06-13
申请号:US14245806
申请日:2014-04-04
Applicant: Google Inc.
Inventor: Huican Zhu , Anurag Acharya , Max Ibel , Howard Bradley Gobioff
IPC: G06F17/30
CPC classification number: G06F17/30864
Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
-
-