System and method for prioritizing websites during a webcrawling process
    3.
    发明授权
    System and method for prioritizing websites during a webcrawling process 失效
    在Web抓取过程中优先处理网站的系统和方法

    公开(公告)号:US07475069B2

    公开(公告)日:2009-01-06

    申请号:US11392856

    申请日:2006-03-29

    Abstract: A system and method for prioritizing a fetch order of web pages. The method comprises extracting by a web crawler a set of candidate web pages to be crawled. Each web page in the set of candidate web pages is associated with a website in a computer network. A determination is made to determine if a first website score for the website is in a website score database. The first website score is associated with web pages in the set of candidate web pages if the first website score exists in the website score database. The set of candidate web pages is prioritized with respect to an associated website score for each web page in the candidate set of web pages. Content is retrieved from the set of candidate web. Hyperlinks are extracted from the content. The hyperlinks are stored in a memory unit.

    Abstract translation: 用于优先处理网页的获取顺序的系统和方法。 该方法包括由网络爬行器提取要爬网的一组候选网页。 候选网页集合中的每个网页与计算机网络中的网站相关联。 确定确定网站的第一网站得分是否在网站得分数据库中。 如果网站得分数据库中存在第一个网站分数,则第一个网站得分与该候选网页集中的网页相关联。 候选网页的集合对于候选网页集合中的每个网页的相关网站评分是优先的。 从候选网络集中检索内容。 从内容中提取超链接。 超链接存储在存储单元中。

    System and method for prioritizing websites during a webcrawling process
    4.
    发明授权
    System and method for prioritizing websites during a webcrawling process 失效
    在Web抓取过程中优先处理网站的系统和方法

    公开(公告)号:US07966337B2

    公开(公告)日:2011-06-21

    申请号:US12143885

    申请日:2008-06-23

    Abstract: A system and method for prioritizing a fetch order of web pages. The method comprises extracting by a web crawler a set of candidate web pages to be crawled. Each web page in the set of candidate web pages is associated with a website in a computer network. A determination is made to determine if a first website score for the website is in a website score database. The first website score is associated with web pages in the set of candidate web pages if the first website score exists in the website score database. The set of candidate web pages is prioritized with respect to an associated website score for each web page in the candidate set of web pages. Content is retrieved from the set of candidate web. Hyperlinks are extracted from the content. The hyperlinks are stored in a memory unit.

    Abstract translation: 用于优先处理网页的获取顺序的系统和方法。 该方法包括由网络爬行器提取要爬网的一组候选网页。 候选网页集合中的每个网页与计算机网络中的网站相关联。 确定确定网站的第一网站得分是否在网站得分数据库中。 如果网站得分数据库中存在第一个网站分数,则第一个网站得分与该候选网页集中的网页相关联。 候选网页的集合对于候选网页集合中的每个网页的相关网站评分是优先的。 从候选网络集中检索内容。 从内容中提取超链接。 超链接存储在存储单元中。

    SYSTEM AND METHOD FOR CREATION, REPRESENTATION, AND DELIVERY OF DOCUMENT CORPUS ENTITY CO-OCCURRENCE INFORMATION
    5.
    发明申请
    SYSTEM AND METHOD FOR CREATION, REPRESENTATION, AND DELIVERY OF DOCUMENT CORPUS ENTITY CO-OCCURRENCE INFORMATION 审中-公开
    系统和方法创建,陈述和交付文件公司实体信息

    公开(公告)号:US20080215585A1

    公开(公告)日:2008-09-04

    申请号:US12061022

    申请日:2008-04-02

    Abstract: To respond to queries that relate to co-occurring entities on the Web, a compact sparse matrix representing entity co-occurrences is generated and then accessed to satisfy queries. The sparse matrix has groups of sub-rows, with each group corresponding to an entity in a document corpus. The groups are sorted from most occurring entity to least occurring entity. Each sub-row within a group corresponds to an entity that co-occurs in the document corpus, within a co-occurrence criterion, with the entity represented by the group, and to facilitate query response the sub-rows within a group are sorted from most occurring co-occurrence to least occurring co-occurrence.

    Abstract translation: 为了响应与Web上的共存实体相关的查询,生成表示实体共同出现的紧凑稀疏矩阵,然后访问以满足查询。 稀疏矩阵具有子行组,每个组对应于文档语料库中的实体。 这些组从大多数发生的实体排序到最不发生的实体。 组内的每个子行对应于在同时发生标准中与文档语料库共同出现的实体,与实体由组表示,并且为了便于查询响应,组内的子行将从 大多数同时发生到最少出现的同现。

    System and method for creation, representation, and delivery of document corpus entity co-occurrence information
    7.
    发明授权
    System and method for creation, representation, and delivery of document corpus entity co-occurrence information 失效
    用于创建,表示和传送文档语料库实体信息的系统和方法

    公开(公告)号:US07587407B2

    公开(公告)日:2009-09-08

    申请号:US11442376

    申请日:2006-05-26

    Abstract: To respond to queries that relate to co-occurring entities on the Web, a compact sparse matrix representing entity co-occurrences is generated and then accessed to satisfy queries. The sparse matrix has groups of sub-rows, with each group corresponding to an entity in a document corpus. The groups are sorted from most occurring entity to least occurring entity. Each sub-row within a group corresponds to an entity that co-occurs in the document corpus, within a co-occurrence criterion, with the entity represented by the group, and to facilitate query response the sub-rows within a group are sorted from most occurring co-occurrence to least occurring co-occurrence.

    Abstract translation: 为了响应与Web上的共存实体相关的查询,生成表示实体共同出现的紧凑稀疏矩阵,然后访问以满足查询。 稀疏矩阵具有子行组,每个组对应于文档语料库中的实体。 这些组从大多数发生的实体排序到最不发生的实体。 组内的每个子行对应于在同时发生标准中与文档语料库共同出现的实体,与实体由组表示,并且为了便于查询响应,组内的子行将从 大多数同时发生到最少出现的同现。

    System and method for creation, representation, and delivery of document corpus entity co-occurrence information
    8.
    发明申请
    System and method for creation, representation, and delivery of document corpus entity co-occurrence information 失效
    用于创建,表示和传送文档语料库实体信息的系统和方法

    公开(公告)号:US20070276881A1

    公开(公告)日:2007-11-29

    申请号:US11442376

    申请日:2006-05-26

    Abstract: To respond to queries that relate to co-occurring entities on the Web, a compact sparse matrix representing entity co-occurrences is generated and then accessed to satisfy queries. The sparse matrix has groups of sub-rows, with each group corresponding to an entity in a document corpus. The groups are sorted from most occurring entity to least occurring entity. Each sub-row within a group corresponds to an entity that co-occurs in the document corpus, within a co-occurrence criterion, with the entity represented by the group, and to facilitate query response the sub-rows within a group are sorted from most occurring co-occurrence to least occurring co-occurrence.

    Abstract translation: 为了响应与Web上的共存实体相关的查询,生成表示实体共同出现的紧凑稀疏矩阵,然后访问以满足查询。 稀疏矩阵具有子行组,每个组对应于文档语料库中的实体。 这些组从大多数发生的实体排序到最不发生的实体。 组内的每个子行对应于在同时发生标准中与文档语料库共同出现的实体,与实体由组表示,并且为了便于查询响应,组内的子行将从 大多数同时发生到最少出现的同现。

    SYSTEM AND METHOD FOR PRIORITIZING WEBSITES DURING A WEBCRAWLING PROCESS
    9.
    发明申请
    SYSTEM AND METHOD FOR PRIORITIZING WEBSITES DURING A WEBCRAWLING PROCESS 失效
    在WEBCRAWLING过程中优化网站的系统和方法

    公开(公告)号:US20080256046A1

    公开(公告)日:2008-10-16

    申请号:US12143885

    申请日:2008-06-23

    Abstract: A system and method for prioritizing a fetch order of web pages. The method comprises extracting by a web crawler a set of candidate web pages to be crawled. Each web page in the set of candidate web pages is associated with a website in a computer network. A determination is made to determine if a first website score for the website is in a website score database. The first website score is associated with web pages in the set of candidate web pages if the first website score exists in the website score database. The set of candidate web pages is prioritized with respect to an associated website score for each web page in the candidate set of web pages. Content is retrieved from the set of candidate web. Hyperlinks are extracted from the content. The hyperlinks are stored in a memory unit.

    Abstract translation: 用于优先处理网页的获取顺序的系统和方法。 该方法包括由网络爬行器提取要爬网的一组候选网页。 候选网页集合中的每个网页与计算机网络中的网站相关联。 确定确定网站的第一网站得分是否在网站得分数据库中。 如果网站得分数据库中存在第一个网站分数,则第一个网站得分与该候选网页集中的网页相关联。 候选网页的集合对于候选网页集合中的每个网页的相关网站评分是优先的。 从候选网络集中检索内容。 从内容中提取超链接。 超链接存储在存储单元中。

Patent Agency Ranking