Systems and methods for personalizing aggregated news content
    21.
    发明申请
    Systems and methods for personalizing aggregated news content 有权
    个性化聚合新闻内容的系统和方法

    公开(公告)号:US20050165743A1

    公开(公告)日:2005-07-28

    申请号:US10748663

    申请日:2003-12-31

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30867

    摘要: A system customizes a news document associated with a user of a news aggregation service. The system includes multiple news source servers that store news content and a remote news aggregation server. The news aggregation server creates a customized news document based on one or more personalized search queries received from a user. The news aggregation server fetches the news content from the multiple news source servers, aggregates the news content, and searches the aggregated news content based on the one or more personalized search queries. The news aggregation server provides selected news content to the customized news document based on results of the search.

    摘要翻译: 系统自定义与新闻聚合服务的用户相关联的新闻文档。 该系统包括存储新闻内容的多个新闻源服务器和远程新闻聚合服务器。 新闻聚合服务器基于从用户接收的一个或多个个性化搜索查询创建定制的新闻文档。 新闻聚合服务器从多个新闻源服务器获取新闻内容,聚合新闻内容,并根据一个或多个个性化搜索查询搜索聚合新闻内容。 新闻聚合服务器根据搜索结果向定制的新闻文档提供选定的新闻内容。

    Scheduler for search engine crawler
    22.
    发明授权
    Scheduler for search engine crawler 有权
    搜索引擎抓取器的计划程序

    公开(公告)号:US08707313B1

    公开(公告)日:2014-04-22

    申请号:US13031011

    申请日:2011-02-18

    IPC分类号: G06F9/46 G06F7/00

    CPC分类号: G06F17/30864

    摘要: A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

    摘要翻译: 搜索引擎爬行器包括与一个或多个文档标识符(例如,URL)相关联的分布式的一组调度器,对应于网络上的文档(例如,WWW)。 每个调度器处理已知文档标识符的子集的文档标识符(用于爬行)的调度。 使用文档标识符的起始集合,例如在最近完成的爬网期间爬行(或计划进行爬网)的文档标识符,调度程序从起始设置中删除那些在最后一次X爬网中的每一个中都无法访问的文档标识符。 其他过滤机制也可用于过滤出起始集中的一些文档标识符。 生成的文档标识符列表将写入一个预定的输出文件,以供下一个爬网周期使用。

    Identifying multiple versions of documents
    23.
    发明授权
    Identifying multiple versions of documents 有权
    识别文档的多个版本

    公开(公告)号:US08316292B1

    公开(公告)日:2012-11-20

    申请号:US11283228

    申请日:2005-11-18

    CPC分类号: G06F17/2211 G06F17/30722

    摘要: A system and method identifies different versions of the same document in a document collection. The system and method creates multiple candidate identifiers for each document based on information associated with the document, and processes the candidate identifiers according to language specific rules. The system and method compares the processed candidate identifiers for similarity, and identifies different versions of documents based on the similarity.

    摘要翻译: 系统和方法识别文档集合中同一文档的不同版本。 系统和方法基于与文档相关联的信息为每个文档创建多个候选标识符,并根据语言特定规则处理候选标识符。 该系统和方法将所处理的候选标识符与相似性进行比较,并根据相似性识别不同版本的文档。

    Generation of document snippets based on queries and search results
    24.
    发明授权
    Generation of document snippets based on queries and search results 有权
    根据查询和搜索结果生成文档片段

    公开(公告)号:US08145617B1

    公开(公告)日:2012-03-27

    申请号:US11282560

    申请日:2005-11-18

    IPC分类号: G06F7/00

    摘要: A document retrieval system generates snippets of documents for display as part of a user interface screen with search results. The snippet may be generated based on the type of query or the location of the query terms in the document. Different snippet generation algorithms may be used depending on the query type. Alternatively, snippets may be generated based on an analysis of the location of the query terms in the document.

    摘要翻译: 文档检索系统生成用于显示的文档片段,作为具有搜索结果的用户界面屏幕的一部分。 该片段可以根据查询的类型或文档中查询词的位置生成。 可以根据查询类型使用不同的代码片段生成算法。 或者,可以基于对文档中的查询项的位置的分析来生成片段。

    Systems and methods for syndicating and hosting customized news content
    25.
    发明授权
    Systems and methods for syndicating and hosting customized news content 有权
    用于聚合和托管定制新闻内容的系统和方法

    公开(公告)号:US08126865B1

    公开(公告)日:2012-02-28

    申请号:US10748661

    申请日:2003-12-31

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30893

    摘要: A system provides client access to customized news content. The system includes a custom news source server and a news search server. The custom news source server periodically sends one or more customized search queries to a news search server. The news search server fetches news content from multiple news source servers and aggregates the news content. The news search server also periodically receives the one or more search queries from the custom news source server, searches the aggregated news content based on the one or more search queries, and periodically provides selected news content to the custom news server based on results of the searches. The custom news source server permits access to clients, from across a network, to the selected news content provided by the news search server.

    摘要翻译: 系统提供客户端访问定制的新闻内容。 该系统包括一个定制的新闻源服务器和一个新闻搜索服务器。 定制新闻源服务器周期性地向新闻搜索服务器发送一个或多个定制的搜索查询。 新闻搜索服务器从多个新闻源服务器获取新闻内容,并汇总新闻内容。 新闻搜索服务器还周期性地从定制的新闻源服务器接收一个或多个搜索查询,基于一个或多个搜索查询搜索聚合的新闻内容,并且基于该搜索查询的结果周期性地向定制的新闻服务器提供选定的新闻内容 搜索。 定制的新闻源服务器允许访问客户端,从网络到新闻搜索服务器提供的所选新闻内容。

    Library citation integration
    26.
    发明授权
    Library citation integration 有权
    图书馆引文整合

    公开(公告)号:US07526475B1

    公开(公告)日:2009-04-28

    申请号:US11432039

    申请日:2006-05-10

    IPC分类号: G06F7/00 G06F17/00

    摘要: An online search system generates an index of documents using index information received from a library. Some documents have restricted access; some documents may not be available online. The search system provides links to documents in the library as well as other sites based on a search, and may include link resolvers received from the library. The search system provides access links to the link resolvers if an identifier, such as a user identification or IP address, matches an affiliation list from the library.

    摘要翻译: 在线搜索系统使用从库接收的索引信息生成文档索引。 有些文件限制访问; 一些文件可能无法在线上。 搜索系统提供到图书馆中的文档以及基于搜索的其他站点的链接,并且可以包括从图书馆接收的链接解析器。 如果诸如用户标识或IP地址的标识符与来自库的隶属关系列表匹配,则搜索系统提供到链接解析器的访问链接。

    Query modification
    27.
    发明授权
    Query modification 有权
    查询修改

    公开(公告)号:US08819000B1

    公开(公告)日:2014-08-26

    申请号:US13461315

    申请日:2012-05-01

    IPC分类号: G06F17/30

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for query modification. In one aspect, a method includes receiving an original query including a first limitation. First search results responsive to a modified query are obtained, where the first limitation has been omitted from the modified query. One or more common characteristics shared by two or more resources are identified. Each of the two or more resources corresponds to a different highly-ranked result of the first search results. A second modified query including the original query and a second limitation representing the one or more common characteristics is generated. Second search results responsive to the second modified query are obtained. The second search results are provided in a response to the original query.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于查询修改。 一方面,一种方法包括接收包括第一限制的原始查询。 获得响应于修改查询的第一搜索结果,其中已经从修改的查询中省略了第一个限制。 识别由两个或多个资源共享的一个或多个共同特征。 两个或更多个资源中的每一个对应于第一搜索结果的不同高度排名的结果。 生成包括原始查询和表示一个或多个共同特征的第二限制的第二修改查询。 获得响应于第二修改查询的第二搜索结果。 响应于原始查询提供第二个搜索结果。

    Assigning Document Identification Tags
    29.
    发明申请
    Assigning Document Identification Tags 有权
    分配文件识别标签

    公开(公告)号:US20120173552A1

    公开(公告)日:2012-07-05

    申请号:US13419349

    申请日:2012-03-13

    IPC分类号: G06F17/30

    摘要: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.

    摘要翻译: 文件识别标签被分配给要添加到文档集合的文档。 基于与新文档的查询无关信息,文档识别标签被分配给新文档。 所分配的文档识别标签用于新文档的索引。 当响应于查询而由索引产生文档识别标签的列表时,该列表关于与查询无关的相关度的度量近似排序。 在一些实施例中,与查询无关的相关性的度量与万维网的连接矩阵相关。 在其他实施例中,该度量与爬行的新近相关。 在其他实施方案中,测量是这两者的混合物。 所提供的系统和方法允许在从文档集合中爬取时对文档进行实时索引。

    Scheduler for search engine crawler
    30.
    发明授权
    Scheduler for search engine crawler 有权
    搜索引擎抓取器的计划程序

    公开(公告)号:US08042112B1

    公开(公告)日:2011-10-18

    申请号:US10882956

    申请日:2004-06-30

    IPC分类号: G06F9/46 G06F7/00

    CPC分类号: G06F17/30864

    摘要: A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

    摘要翻译: 搜索引擎爬行器包括与一个或多个文档标识符(例如,URL)相关联的分布式的一组调度器,对应于网络上的文档(例如,WWW)。 每个调度器处理已知文档标识符的子集的文档标识符(用于爬行)的调度。 使用文档标识符的起始集合,例如在最近完成的爬网期间爬行(或计划进行爬网)的文档标识符,调度程序从起始设置中删除那些在最后一次X爬网中的每一个中都无法访问的文档标识符。 其他过滤机制也可用于过滤出起始集中的一些文档标识符。 生成的文档标识符列表将写入一个预定的输出文件,以供下一个爬网周期使用。