-
公开(公告)号:US20110016471A1
公开(公告)日:2011-01-20
申请号:US12503147
申请日:2009-07-15
CPC分类号: G06F16/951
摘要: Balancing resource allocations based on priority may be provided. First, a plurality of repositories may be divided into at least two categories. Next, a first portion of computing resources may be dedicated to a first one of the at least two categories. Then a second portion of the computing resources may be dedicated to a second one of the at least two categories. A crawl may then be performed on the plurality of repositories with the computing resources.
摘要翻译: 可以提供基于优先级的资源分配平衡。 首先,多个存储库可以被划分为至少两个类别。 接下来,计算资源的第一部分可以专用于至少两个类别中的第一部分。 然后计算资源的第二部分可以专用于至少两个类别中的第二部分。 然后可以利用计算资源在多个存储库上执行爬行。
-
公开(公告)号:US07644107B2
公开(公告)日:2010-01-05
申请号:US10956891
申请日:2004-09-30
CPC分类号: G06F17/30861
摘要: A process takes advantage of a structure of a server hosting a network site that includes a change log stored in a database to batch index documents for search queries. The content of the site is batched and shipped in bulk from the server to an indexer. The change log keeps track of the changes to the content of the site. The indexer incrementally requests updates to the index using the change log and batches the changes so that the bandwidth usage and processor overhead costs are reduced.
摘要翻译: 一个进程利用托管网站的服务器的结构,其中包括存储在数据库中的更改日志,用于搜索查询的批索引文档。 网站的内容已批量批量运输,并从服务器发货到索引器。 更改日志会跟踪站点内容的更改。 索引器使用更改日志递增地请求对索引的更新,并批量更改,以减少带宽使用量和处理器间接成本。
-
公开(公告)号:US20060074911A1
公开(公告)日:2006-04-06
申请号:US10956891
申请日:2004-09-30
IPC分类号: G06F17/30
CPC分类号: G06F17/30861
摘要: A process takes advantage of a structure of a server hosting a network site that includes a change log stored in a database to batch index documents for search queries. The content of the site is batched and shipped in bulk from the server to an indexer. The change log keeps track of the changes to the content of the site. The indexer incrementally requests updates to the index using the change log and batches the changes so that the bandwidth usage and processor overhead costs are reduced.
摘要翻译: 一个进程利用托管网站的服务器的结构,其中包括存储在数据库中的更改日志,用于搜索查询的批索引文档。 网站的内容已批量批量运输,并从服务器发货到索引器。 更改日志会跟踪站点内容的更改。 索引器使用更改日志递增地请求对索引的更新,并批量更改,以减少带宽使用量和处理器间接成本。
-
公开(公告)号:US08108388B2
公开(公告)日:2012-01-31
申请号:US11412725
申请日:2006-04-26
IPC分类号: G06F17/30
CPC分类号: G06F17/30699
摘要: An alert search mechanism is used with search engines such as a crawler to search for desired documents and/or resources. Particular documents are found by using search queries. The search mechanism track values of a set of relevant properties in queries. Whenever a document is searched for by the system, the values of these set of properties are matched with the old value. If there is no match, this is an indication that the document has changed.
摘要翻译: 搜索引擎(例如爬行器)使用警报搜索机制来搜索期望的文档和/或资源。 使用搜索查询查找特定文档。 搜索机制跟踪查询中一组相关属性的值。 每当系统搜索文档时,这些属性的值与旧值相匹配。 如果没有匹配,则表示该文档已更改。
-
公开(公告)号:US20110125726A1
公开(公告)日:2011-05-26
申请号:US12625603
申请日:2009-11-25
CPC分类号: G06F16/951
摘要: A smart algorithm for processing transaction from a crawl queue. If the crawler has in memory a predetermined number of URLs for a given host, the crawler reads from the crawl queue URLs from other hosts. As a result the crawler processes multiple hosts concurrently, and thus, uses machine resources more effectively and efficiently to process the URLs. The smart algorithm can further consider other criteria in deciding which URLs to read from the queue. These criteria can include the response time for each repository (host) the crawler processes. Additionally, the crawler can allocate its resources according to content groups (e.g., two pools), one group for faster content delivery and the second group one for slower content delivery. Thus, crawler resources can be partitioned or divided across different pools depending on repository response time. Other criteria can be provided and considered as well.
摘要翻译: 用于处理来自爬网队列的事务的智能算法。 如果爬网程序在内存中有一个给定主机的预定数量的URL,则爬网程序从其他主机的爬网队列中读取URL。 因此,爬网程序同时处理多个主机,从而更有效地使用机器资源来处理URL。 智能算法可以进一步考虑其他标准来决定从队列中读取哪些URL。 这些标准可以包括爬网程序处理的每个存储库(主机)的响应时间。 此外,爬虫可以根据内容组(例如,两个池)分配其资源,一组用于更快的内容传送,另一组用于较慢的内容传送。 因此,根据存储库响应时间,可以跨越不同的池对爬网资源进行分区或划分。 也可以提供和考虑其他标准。
-
公开(公告)号:US20070255744A1
公开(公告)日:2007-11-01
申请号:US11412725
申请日:2006-04-26
IPC分类号: G06F7/00
CPC分类号: G06F17/30699
摘要: An alert search mechanism is used with search engines such as a crawler to search for desired documents and/or resources. Particular documents are found by using search queries. The search mechanism track values of a set of relevant properties in queries. Whenever a document is searched for by the system, the values of these set of properties are matched with the old value. If there is no match, this is an indication that the document has changed.
摘要翻译: 搜索引擎(例如爬行器)使用警报搜索机制来搜索期望的文档和/或资源。 使用搜索查询查找特定文档。 搜索机制跟踪查询中一组相关属性的值。 每当系统搜索文档时,这些属性的值与旧值相匹配。 如果没有匹配,则表示该文档已更改。
-
-
-
-
-