Extracting query dimensions from search results

    公开(公告)号:US09785704B2

    公开(公告)日:2017-10-10

    申请号:US13343621

    申请日:2012-01-04

    IPC分类号: G06F17/30

    摘要: Techniques are described for automatically mining query dimensions from web pages resulting from execution of a search query. Lists of items such as words, terms, or phrases are extracted from the web pages based on the recognition of free text, metadata tag, or repeated region patterns within the web page text. Extracted item lists are weighted according to document matching and/or inverse document frequency, and item lists are clustered based on shared or similar items within the lists to generate query dimensions. The generated query dimensions, and the items within each query dimension, are ranked according to quality, and high-quality query dimensions are provided for display alongside top search results.

    Using anchor text with hyperlink structures for web searches
    2.
    发明授权
    Using anchor text with hyperlink structures for web searches 有权
    使用锚文本与超链接结构进行网页搜索

    公开(公告)号:US08380722B2

    公开(公告)日:2013-02-19

    申请号:US12748903

    申请日:2010-03-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30887

    摘要: This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches. Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.

    摘要翻译: 本文档描述了调整锚文本权重以提供更相关的搜索引擎结果的工具。 具体来说,这些工具利用站点关系模型来考虑不仅锚文本源站点和目标页面之间的关系,还考虑多个锚文本源站点之间的关系,以改进Web搜索。 考虑这些关系有助于确定新的锚文本权重,这又导致更相关的搜索结果。

    Using Anchor Text With Hyperlink Structures for Web Searches
    3.
    发明申请
    Using Anchor Text With Hyperlink Structures for Web Searches 有权
    使用超链接结构使用锚文本进行网页搜索

    公开(公告)号:US20110238644A1

    公开(公告)日:2011-09-29

    申请号:US12748903

    申请日:2010-03-29

    IPC分类号: G06F3/14 G06F17/30

    CPC分类号: G06F17/30887

    摘要: This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches. Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.

    摘要翻译: 本文档描述了调整锚文本权重以提供更相关的搜索引擎结果的工具。 具体来说,这些工具利用站点关系模型来考虑不仅锚文本源站点和目标页面之间的关系,还考虑多个锚文本源站点之间的关系,以改进Web搜索。 考虑这些关系有助于确定新的锚文本权重,这又导致更相关的搜索结果。

    Extracting Query Dimensions from Search Results
    4.
    发明申请
    Extracting Query Dimensions from Search Results 有权
    从搜索结果中提取查询维度

    公开(公告)号:US20130173605A1

    公开(公告)日:2013-07-04

    申请号:US13343621

    申请日:2012-01-04

    IPC分类号: G06F17/30

    摘要: Techniques are described for automatically mining query dimensions from web pages resulting from execution of a search query. Lists of items such as words, terms, or phrases are extracted from the web pages based on the recognition of free text, metadata tag, or repeated region patterns within the web page text. Extracted item lists are weighted according to document matching and/or inverse document frequency, and item lists are clustered based on shared or similar items within the lists to generate query dimensions. The generated query dimensions, and the items within each query dimension, are ranked according to quality, and high-quality query dimensions are provided for display alongside top search results.

    摘要翻译: 描述了从执行搜索查询产生的网页自动挖掘查询维度的技术。 基于对网页文本中的自由文本,元数据标签或重复的区域模式的识别,从网页中提取诸如单词,术语或短语的项目列表。 提取的项目列表根据文档匹配和/或逆文档频率进行加权,并且项目列表基于列表中的共享或类似项目进行聚类,以生成查询维度。 生成的查询维度以及每个查询维度中的项目按照质量进行排名,并提供高质量的查询维度以便与顶部搜索结果一起显示。

    INFORMATION SENSORS FOR SENSING WEB DYNAMICS
    5.
    发明申请
    INFORMATION SENSORS FOR SENSING WEB DYNAMICS 审中-公开
    感应网络动态信息传感器

    公开(公告)号:US20160125083A1

    公开(公告)日:2016-05-05

    申请号:US14896339

    申请日:2013-06-07

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Disclosed herein are techniques and systems for building “information sensors,” which are programmable “focused crawlers” that periodically discover, extract, analyze and aggregate structured information around a topic from the Web. A platform for building an information sensor allows a user to specify one or more data elements within a data source that the user desires to monitor, and an update frequency at which the data elements are to be extracted. Code may be generated based on the user specifications for creation and submission of the information sensor for storage in a database with metadata containing the code and update frequency. Once created, information sensors are scanned to check if running conditions are met, and if met, they may be executed by retrieving the metadata using a sensor identifier (ID). The code is executed to locate a data source, and periodically extract specified data elements therefrom to output structured time-series data.

    摘要翻译: 本文公开了用于构建“信息传感器”的技术和系统,其是可编程的“聚焦爬行器”,其周期性地从Web发现,提取,分析和聚合关于主题的结构化信息。 用于构建信息传感器的平台允许用户指定用户期望监视的数据源内的一个或多个数据元素以及要提取数据元素的更新频率。 代码可以基于用于创建和提交信息传感器的用户规范来生成,用于存储在具有包含代码和更新频率的元数据的数据库中。 一旦创建,扫描信息传感器以检查是否满足运行条件,如果满足,则可以使用传感器标识符(ID)检索元数据来执行它们。 执行代码以定位数据源,并且从中定期提取指定的数据元素以输出结构化的时间序列数据。

    EXPERIMENTAL WEB SEARCH SYSTEM
    6.
    发明申请
    EXPERIMENTAL WEB SEARCH SYSTEM 审中-公开
    实验WEB搜索系统

    公开(公告)号:US20110078131A1

    公开(公告)日:2011-03-31

    申请号:US12569978

    申请日:2009-09-30

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F16/951

    摘要: Described is the running of search-related experiments on a full (or partial) offline snapshot copy of the search engine documents of an actual production system. A snapshot experimentation subsystem runs experimental code related to web searches on the offline data, including to run experimental index building code to build an experimental index (e.g., to test a new document feature), and/or to run experimental search-related code, such as to rank search results according to experimental ranking code, to implement an experimental search strategy, and/or to generate experimental captions.

    摘要翻译: 描述了对实际生产系统的搜索引擎文档的完整(或部分)离线快照副本的搜索相关实验的运行。 快照实验子系统运行与离线数据上的网络搜索相关的实验代码,包括运行实验索引构建代码来构建实验索引(例如,测试新文档特征)和/或运行实验搜索相关代码, 例如根据实验排名代码对搜索结果进行排名,以实现实验搜索策略,和/或生成实验标题。

    Data-Centric Search Engine Architecture
    7.
    发明申请
    Data-Centric Search Engine Architecture 审中-公开
    以数据为中心的搜索引擎架构

    公开(公告)号:US20110137886A1

    公开(公告)日:2011-06-09

    申请号:US12632821

    申请日:2009-12-08

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.

    摘要翻译: 描述了以数据为中心的网络搜索引擎技术/架构,其中包括离线提取的元数据的文档元数据被用作搜索索引和排序流水线的一部分。 Web数据管理组件接收爬取的文档并从文档中提取文档元数据。 索引组件使用文档元数据构建文档的索引。 服务组件使用索引和文档元数据来提供内容,例如搜索结果。 还描述了使用从查询日志的查询中提取的查询元数据用于流水线。

    Determining relevance of a document to a query based on spans of query terms
    8.
    发明授权
    Determining relevance of a document to a query based on spans of query terms 有权
    根据查询项的跨度确定文档与查询的相关性

    公开(公告)号:US07480652B2

    公开(公告)日:2009-01-20

    申请号:US11259621

    申请日:2005-10-26

    IPC分类号: G06F17/30 G06F15/16

    摘要: A relevance system determines the relevance of a query term to a document based on spans within the document that contain the query term. The relevance system aggregates the relevance of the query terms into an overall relevance for the document. For each query term, the relevance system calculates a span relevance for each span that contains that query term. The relevance system then aggregates the span relevances for a query term into a query term relevance for that document. The relevance system may aggregate the query term relevances into a document relevance.

    摘要翻译: 相关系统基于包含查询项的文档中的跨度来确定查询项与文档的相关性。 相关系统将查询词的相关性聚合到文档的整体相关性。 对于每个查询项,相关系统计算包含该查询项的每个跨度的跨度相关性。 相关系统然后将查询项的跨度相关性聚合到该文档的查询词相关性中。 相关系统可以将查询词语相关性合并成文档相关性。

    Method and system for calculating importance of a block within a display page
    9.
    发明授权
    Method and system for calculating importance of a block within a display page 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US08095478B2

    公开(公告)日:2012-01-10

    申请号:US12101109

    申请日:2008-04-10

    IPC分类号: G06F17/00 G06F17/20

    摘要: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    摘要翻译: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。

    METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE
    10.
    发明申请
    METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US20080256068A1

    公开(公告)日:2008-10-16

    申请号:US12101109

    申请日:2008-04-10

    IPC分类号: G06F7/00

    摘要: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    摘要翻译: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。