LARGE-SCALE REAL-TIME FETCH SERVICE
    1.
    发明申请
    LARGE-SCALE REAL-TIME FETCH SERVICE 审中-公开
    大规模实时充电服务

    公开(公告)号:US20130117252A1

    公开(公告)日:2013-05-09

    申请号:US13644297

    申请日:2012-10-04

    Applicant: Google Inc.

    CPC classification number: G06F16/951

    Abstract: System and method for fetching embedded object content as part of a batch crawl. A fetch server receives a request on a request thread to retrieve content for objects embedded in a document, such as a web page. The fetch server attempts to locate the content of the object in cache first and in disk storage next. If the content is not located in the cache the fetch server may switch the request to a worker thread. If the content is not located in the disk storage, the fetch server may schedule a request to retrieve the content of the embedded object through a batch web crawl. Scheduling a request may include determining that a request to crawl the content of the object has already been scheduled or inserting a request into a scheduling queue.

    Abstract translation: 作为批量抓取的一部分,提取嵌入对象内容的系统和方法。 获取服务器在请求线程上接收请求,以检索内嵌在诸如网页之类的文档中的对象的内容。 抓取服务器尝试首先在磁盘存储器中首先找到对象的内容,然后再尝试查找。 如果内容不在高速缓存中,则获取服务器可以将请求切换到工作线程。 如果内容不在磁盘存储器中,则获取服务器可以通过批量网络爬网来调度请求以检索嵌入对象的内容。 调度请求可以包括确定已经调度了用于爬取对象的内容的请求或将请求插入到调度队列中。

Patent Agency Ranking