Interactive System for Extracting Data from a Website
    1.
    发明申请
    Interactive System for Extracting Data from a Website 审中-公开
    从网站提取数据的互动系统

    公开(公告)号:US20110191381A1

    公开(公告)日:2011-08-04

    申请号:US12696061

    申请日:2010-01-29

    IPC分类号: G06F17/30

    CPC分类号: G06F16/00

    摘要: Described is a technology for efficiently labeling a webpage. A wrapper tool labels records of a webpage at the record level. If an existing wrapper exists that is appropriate for labeling a record, the wrapper tool automatically labels that record. For unlabeled records, the tool provides a user interface to label those records, and updates the set of existing wrappers with a new wrapper that is generated based upon the labeling operation; the new wrapper is then applied to any unlabeled records if appropriate for those records. As a result, a user typically needs only to label a relatively few records, with the wrappers generated for those records automatically used to label the other unlabeled records of the webpage.

    摘要翻译: 描述了一种有效地标记网页的技术。 包装工具在记录级别上标记网页的记录。 如果存在适用于标记记录的现有包装器,则包装工具会自动标记该记录。 对于未标记的记录,该工具提供用户界面来标记这些记录,并使用基于标签操作生成的新包装器来更新现有包装器集合; 如果适用于这些记录,则将新的包装器应用于任何未标记的记录。 因此,用户通常仅需要标记相对较少的记录,为这些记录生成的包装器自动用于标记网页的其他未标记的记录。

    Determining relevance of a document to a query based on spans of query terms
    4.
    发明授权
    Determining relevance of a document to a query based on spans of query terms 有权
    根据查询项的跨度确定文档与查询的相关性

    公开(公告)号:US07480652B2

    公开(公告)日:2009-01-20

    申请号:US11259621

    申请日:2005-10-26

    IPC分类号: G06F17/30 G06F15/16

    摘要: A relevance system determines the relevance of a query term to a document based on spans within the document that contain the query term. The relevance system aggregates the relevance of the query terms into an overall relevance for the document. For each query term, the relevance system calculates a span relevance for each span that contains that query term. The relevance system then aggregates the span relevances for a query term into a query term relevance for that document. The relevance system may aggregate the query term relevances into a document relevance.

    摘要翻译: 相关系统基于包含查询项的文档中的跨度来确定查询项与文档的相关性。 相关系统将查询词的相关性聚合到文档的整体相关性。 对于每个查询项,相关系统计算包含该查询项的每个跨度的跨度相关性。 相关系统然后将查询项的跨度相关性聚合到该文档的查询词相关性中。 相关系统可以将查询词语相关性合并成文档相关性。

    Method and system for calculating importance of a block within a display page
    5.
    发明授权
    Method and system for calculating importance of a block within a display page 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US08095478B2

    公开(公告)日:2012-01-10

    申请号:US12101109

    申请日:2008-04-10

    IPC分类号: G06F17/00 G06F17/20

    摘要: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    摘要翻译: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。

    METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE
    6.
    发明申请
    METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US20080256068A1

    公开(公告)日:2008-10-16

    申请号:US12101109

    申请日:2008-04-10

    IPC分类号: G06F7/00

    摘要: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    摘要翻译: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。

    Using Anchor Text With Hyperlink Structures for Web Searches
    7.
    发明申请
    Using Anchor Text With Hyperlink Structures for Web Searches 有权
    使用超链接结构使用锚文本进行网页搜索

    公开(公告)号:US20110238644A1

    公开(公告)日:2011-09-29

    申请号:US12748903

    申请日:2010-03-29

    IPC分类号: G06F3/14 G06F17/30

    CPC分类号: G06F17/30887

    摘要: This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches. Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.

    摘要翻译: 本文档描述了调整锚文本权重以提供更相关的搜索引擎结果的工具。 具体来说,这些工具利用站点关系模型来考虑不仅锚文本源站点和目标页面之间的关系,还考虑多个锚文本源站点之间的关系,以改进Web搜索。 考虑这些关系有助于确定新的锚文本权重,这又导致更相关的搜索结果。

    Determining relevance of a document to a query based on spans of query terms
    8.
    发明申请
    Determining relevance of a document to a query based on spans of query terms 有权
    根据查询项的跨度确定文档与查询的相关性

    公开(公告)号:US20070094234A1

    公开(公告)日:2007-04-26

    申请号:US11259621

    申请日:2005-10-26

    IPC分类号: G06F17/30

    摘要: A relevance system determines the relevance of a query term to a document based on spans within the document that contain the query term. The relevance system aggregates the relevance of the query terms into an overall relevance for the document. For each query term, the relevance system calculates a span relevance for each span that contains that query term. The relevance system then aggregates the span relevances for a query term into a query term relevance for that document. The relevance system may aggregate the query term relevances into a document relevance.

    摘要翻译: 相关系统基于包含查询项的文档中的跨度来确定查询项与文档的相关性。 相关系统将查询词的相关性聚合到文档的整体相关性。 对于每个查询项,相关系统计算包含该查询项的每个跨度的跨度相关性。 相关系统然后将查询项的跨度相关性聚合到该文档的查询词相关性中。 相关系统可以将查询词语相关性合并成文档相关性。

    Method and system for calculating importance of a block within a display page
    9.
    发明申请
    Method and system for calculating importance of a block within a display page 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US20050246296A1

    公开(公告)日:2005-11-03

    申请号:US10834639

    申请日:2004-04-29

    摘要: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    摘要翻译: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。

    Extracting query dimensions from search results

    公开(公告)号:US09785704B2

    公开(公告)日:2017-10-10

    申请号:US13343621

    申请日:2012-01-04

    IPC分类号: G06F17/30

    摘要: Techniques are described for automatically mining query dimensions from web pages resulting from execution of a search query. Lists of items such as words, terms, or phrases are extracted from the web pages based on the recognition of free text, metadata tag, or repeated region patterns within the web page text. Extracted item lists are weighted according to document matching and/or inverse document frequency, and item lists are clustered based on shared or similar items within the lists to generate query dimensions. The generated query dimensions, and the items within each query dimension, are ranked according to quality, and high-quality query dimensions are provided for display alongside top search results.