DEEP-CONTENT INDEXING AND CONSOLIDATION
    1.
    发明申请
    DEEP-CONTENT INDEXING AND CONSOLIDATION 审中-公开
    深度指标和综合

    公开(公告)号:US20100082573A1

    公开(公告)日:2010-04-01

    申请号:US12235798

    申请日:2008-09-23

    IPC分类号: G06F17/30

    摘要: Methods in computer-readable media for searching a large volume of documents is provided. In embodiments, the plurality of related documents are consolidated by a web host into a synthetic search document. The synthetic search document includes a set of descriptive information for each web page consolidated into the synthetic search document. Each set of descriptive information is associated with a subpart identifier that includes information that allows a search engine to provide a link to navigate to an individual document. Web pages consolidated into a synthetic search document may be edited to include an indication that that web page is not to be individually searched or indexed by a search engine. Similarly, the synthetic search document may be designated as a synthetic search document by information included on it.

    摘要翻译: 提供了一种用于搜索大量文档的计算机可读介质中的方法。 在实施例中,多个相关文档被网络主机合并成合成搜索文档。 合成搜索文档包括合并到合成搜索文档中的每个网页的一组描述性信息。 每组描述性信息与包括允许搜索引擎提供链接以导航到单个文档的信息的子部分标识符相关联。 合并到合成搜索文档中的网页可以被编辑成包括该网页不被搜索引擎单独搜索或索引的指示。 类似地,可以通过包括在其中的信息将合成搜索文档指定为合成搜索文档。

    OPTIMIZING WEB CRAWLING WITH USER HISTORY
    2.
    发明申请
    OPTIMIZING WEB CRAWLING WITH USER HISTORY 有权
    用户历史优化网络抓取

    公开(公告)号:US20130041881A1

    公开(公告)日:2013-02-14

    申请号:US13206256

    申请日:2011-08-09

    IPC分类号: G06F7/00 G06F17/30

    摘要: A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.

    摘要翻译: 礼貌经理根据由客户端网络浏览器上的插件或工具栏生成和发送的历史记录数据来估计网站流量。 历史日志数据详细说明了网页浏览器访问不同网站的日期和时间,这些网站用于了解哪些时间段特定网站正在忙碌以及哪些时间段的网站不忙碌。 根据历史日志数据确定网站的不同时间范围的抓取率,并根据爬网率来抓取网页抓取工具,以尽量减少网络爬网程序请求对网站崩溃负责的机会。

    SMART, SEARCH-ENABLED WEB ERROR PAGES
    3.
    发明申请
    SMART, SEARCH-ENABLED WEB ERROR PAGES 有权
    SMART,SEARCH-ENABLED WEB错误页面

    公开(公告)号:US20100106571A1

    公开(公告)日:2010-04-29

    申请号:US12257067

    申请日:2008-10-23

    IPC分类号: G06F7/06 G06F17/30 G06Q30/00

    摘要: Embodiments of our technology provide a method, system, and media for presenting relevant information incident to attempting to present information that is unavailable by way of a website. One embodiment of the method includes receiving a request to present a desired web page, determining that the desired web page is unavailable for presentation, determining search criteria associated with the request, dynamically generating a second web page that includes search results that were obtained based on the search criteria, and presenting the second web page on a display device.

    摘要翻译: 我们的技术的实施例提供了一种用于呈现相关信息的方法,系统和媒体,以便尝试呈现通过网站不可用的信息。 该方法的一个实施例包括接收呈现期望网页的请求,确定期望的网页不可用于呈现,确定与该请求相关联的搜索准则,动态地生成第二网页,该第二网页包括基于 搜索标准,以及在显示设备上呈现第二网页。

    Content signature notification
    4.
    发明授权
    Content signature notification 有权
    内容签名通知

    公开(公告)号:US09043306B2

    公开(公告)日:2015-05-26

    申请号:US12861788

    申请日:2010-08-23

    IPC分类号: G06F17/30

    摘要: A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.

    摘要翻译: 安装在最终用户计算机上的客户端应用程序从最终用户访问的网页的内容生成元数据,并将元数据提供给搜索引擎。 当最终用户访问网页时,最终用户的计算机下载并将该网页显示给最终用户。 客户端应用程序可以同时访问网页内容,并从网页内容以网页的内容签名的形式生成该元数据。 然后,客户应用程序将内容签名提供给搜索引擎。 搜索引擎可以使用内容签名来识别新的网页来爬行和索引。 此外,搜索引擎可以使用内容签名来识别网页的改变并确定网页的爬行频率。

    Optimizing web crawling with user history
    5.
    发明授权
    Optimizing web crawling with user history 有权
    优化使用用户历史记录的网页抓取

    公开(公告)号:US08782031B2

    公开(公告)日:2014-07-15

    申请号:US13206256

    申请日:2011-08-09

    IPC分类号: G06F17/30

    摘要: A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.

    摘要翻译: 礼貌经理根据由客户端网络浏览器上的插件或工具栏生成和发送的历史记录数据来估计网站流量。 历史日志数据详细说明了网页浏览器访问不同网站的日期和时间,这些网站用于了解哪些时间段特定网站正在忙碌以及哪些时间段的网站不忙碌。 根据历史日志数据确定网站的不同时间范围的抓取率,并根据爬网率来抓取网页抓取工具,以尽量减少网络爬网程序请求对网站崩溃负责的机会。

    CONTENT SIGNATURE NOTIFICATION
    6.
    发明申请
    CONTENT SIGNATURE NOTIFICATION 有权
    内容签名通知

    公开(公告)号:US20120047121A1

    公开(公告)日:2012-02-23

    申请号:US12861788

    申请日:2010-08-23

    IPC分类号: G06F17/30

    摘要: A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.

    摘要翻译: 安装在最终用户计算机上的客户端应用程序从最终用户访问的网页的内容生成元数据,并将元数据提供给搜索引擎。 当最终用户访问网页时,最终用户的计算机下载并将该网页显示给最终用户。 客户端应用程序可以同时访问网页内容,并从网页内容以网页的内容签名的形式生成该元数据。 然后,客户应用程序将内容签名提供给搜索引擎。 搜索引擎可以使用内容签名来识别新的网页来爬行和索引。 此外,搜索引擎可以使用内容签名来识别网页的改变并确定网页的爬行频率。

    System and method for generating contextual survey sequence for search results
    7.
    发明申请
    System and method for generating contextual survey sequence for search results 审中-公开
    用于生成搜索结果的上下文调查序列的系统和方法

    公开(公告)号:US20060173880A1

    公开(公告)日:2006-08-03

    申请号:US11044160

    申请日:2005-01-28

    IPC分类号: G06F17/00 G06F7/00

    摘要: A system and related techniques generate a survey to capture user feedback about the quality of search results, in a continuous context with the user's Web page or other search activity. According to embodiments, a survey frame inviting the user to undertake a set of search questions may be presented within the same set of page frames which display the search results, so that the user may choose to answer the survey while still viewing their search results, or selected Web sites or other hits. According to further embodiments, rather than being presented within the frame structure of a page, the survey may be presented from within a browser toolbar extension, side-by-side or otherwise arranged within the environment of the user's search activity. Unlike other feedback gathering platforms which may force the user to navigate to a new page to view and respond to questions, or transmit email questionnaires after the fact, according to the invention in one regard the user may be prompted into a dialogue to supply feedback about their search experience, while still within the contextual workflow of that experience, and still being able to view or review results or content which they have received. User distraction is therefore minimized while feedback quality may be improved. The user feedback which rates the quality or accuracy of the search results or search experience may in embodiments be stored and used to train search intelligence, or for other purposes.

    摘要翻译: 系统和相关技术在与用户的网页或其他搜索活动的连续上下文中产生调查以捕获用户关于搜索结果的质量的反馈。 根据实施例,可以在显示搜索结果的同一组页面中呈现邀请用户进行一组搜索问题的调查框,使得用户可以选择在仍然查看其搜索结果的同时回答调查, 或选定的网站或其他命中。 根据另外的实施例,不是在页面的帧结构内呈现,而是可以在浏览器工具栏扩展中并排显示调查,或者以其他方式布置在用户的搜索活动的环境内。 与其他反馈收集平台不同,可能迫使用户导航到新页面以查看和回答问题,或者在事实之后传送电子邮件问卷,根据本发明,用户可以被提示进入对话以提供关于 他们的搜索体验仍然在该体验的背景下工作,仍然能够查看或查看他们收到的结果或内容。 因此,可以减少用户分心,同时可以提高反馈质量。 在实施例中可以存储用于对搜索结果或搜索体验的质量或准确性进行评估的用户反馈以训练搜索智能或用于其他目的。

    Smart, search-enabled web error pages
    8.
    发明授权
    Smart, search-enabled web error pages 有权
    智能的,支持搜索的Web错误页面

    公开(公告)号:US08825740B2

    公开(公告)日:2014-09-02

    申请号:US12257067

    申请日:2008-10-23

    摘要: Embodiments of our technology provide a method, system, and media for presenting relevant information incident to attempting to present information that is unavailable by way of a website. One embodiment of the method includes receiving a request to present a desired web page, determining that the desired web page is unavailable for presentation, determining search criteria associated with the request, dynamically generating a second web page that includes search results that were obtained based on the search criteria, and presenting the second web page on a display device.

    摘要翻译: 我们的技术的实施例提供了一种用于呈现相关信息的方法,系统和媒体,以便尝试呈现通过网站不可用的信息。 该方法的一个实施例包括接收呈现期望网页的请求,确定期望的网页不可用于呈现,确定与该请求相关联的搜索准则,动态地生成第二网页,该第二网页包括基于 搜索标准,以及在显示设备上呈现第二网页。