Index Optimization for Ranking Using a Linear Model
    1.
    发明申请
    Index Optimization for Ranking Using a Linear Model 有权
    使用线性模型进行排序的索引优化

    公开(公告)号:US20090327266A1

    公开(公告)日:2009-12-31

    申请号:US12147666

    申请日:2008-06-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30657

    摘要: Technologies are described herein for providing a more efficient approach to ranking search results. One method reduces an amount of ranking data analyzed at query time. In the method, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a high ranking index or a low ranking index based on the simple rank.

    摘要翻译: 本文描述了技术,以提供用于对搜索结果进行排名的更有效的方法。 一种方法减少了在查询时分析的排名数据量。 在该方法中,在索引时间,从主索引选择一个项。 该术语对应于大于阈值的多个文档。 根据主索引选择一组包含该术语的文档。 在包含该术语的文档集中的每个文档确定排名。 包含该术语的文档集中的每个文档被分配到基于简单等级的高排名索引或低排名索引。

    INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL
    2.
    发明申请
    INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL 有权
    使用线性模型进行排序的索引优化

    公开(公告)号:US20100121838A1

    公开(公告)日:2010-05-13

    申请号:US12690100

    申请日:2010-01-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.

    摘要翻译: 本文描述了技术,以提供用于对搜索结果进行排名的更有效的方法。 说明性技术减少了在查询时间分析的排名数据量。 在技​​术中,在索引时间,从主索引中选择一个术语。 该术语对应于大于阈值的多个文档。 根据主索引选择一组包含该术语的文档。 在包含该术语的文档集中的每个文档确定排名。 包含该术语的文档集中的每个文档都会根据排名分配给顶级文档列表或底部文档列表。 至少部分等级的预定义值存储在顶部文档列表中的文档的顶部文档列表中,并且不存储在底部文档列表中的文档的底部文档列表中。

    Index optimization for ranking using a linear model
    3.
    发明授权
    Index optimization for ranking using a linear model 有权
    使用线性模型进行排序的索引优化

    公开(公告)号:US08171031B2

    公开(公告)日:2012-05-01

    申请号:US12690100

    申请日:2010-01-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.

    摘要翻译: 本文描述了技术,以提供用于对搜索结果进行排名的更有效的方法。 说明性技术减少了在查询时间分析的排名数据量。 在技​​术中,在索引时间,从主索引中选择一个术语。 该术语对应于大于阈值的多个文档。 根据主索引选择一组包含该术语的文档。 在包含该术语的文档集中的每个文档确定排名。 包含该术语的文档集中的每个文档都会根据排名分配给顶级文档列表或底部文档列表。 至少部分等级的预定义值存储在顶部文档列表中的文档的顶部文档列表中,并且不存储在底部文档列表中的文档的底部文档列表中。

    Index optimization for ranking using a linear model
    4.
    发明授权
    Index optimization for ranking using a linear model 有权
    使用线性模型进行排序的索引优化

    公开(公告)号:US08161036B2

    公开(公告)日:2012-04-17

    申请号:US12147666

    申请日:2008-06-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30657

    摘要: Technologies are described herein for providing a more efficient approach to ranking search results. One method reduces an amount of ranking data analyzed at query time. In the method, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a high ranking index or a low ranking index based on the simple rank.

    摘要翻译: 本文描述了技术,以提供用于对搜索结果进行排名的更有效的方法。 一种方法减少了在查询时分析的排名数据量。 在该方法中,在索引时间,从主索引选择一个项。 该术语对应于大于阈值的多个文档。 根据主索引选择一组包含该术语的文档。 在包含该术语的文档集中的每个文档确定排名。 包含该术语的文档集中的每个文档被分配到基于简单等级的高排名索引或低排名索引。

    Tenantization of search result ranking
    5.
    发明授权
    Tenantization of search result ranking 有权
    搜索结果排名的趋势

    公开(公告)号:US08694507B2

    公开(公告)日:2014-04-08

    申请号:US13287656

    申请日:2011-11-02

    IPC分类号: G06F7/00 G06F17/30

    摘要: This disclosure describes methods and systems for searching documents in a multi-tenant hosting environment. According to embodiments, to conserve hardware resources, a plurality of documents associated with a plurality of tenants may be mapped to the same search index in the multi-tenant hosting environment. In order to search documents associated only with a single tenant in the multi-tenant hosting environment, a tenant identifier is prepended to every key stored in the search index that is associated with the plurality of documents of the single tenant. Moreover, where one document links to another document within the multi-tenant hosting environment, the link is stored in a web graph when a source tenant identifier matches a target tenant identifier for the link. According to embodiments, when conducting a search, the link is resolved only if the link is stored in the web graph.

    摘要翻译: 本公开描述了用于在多租户托管环境中搜索文档的方法和系统。 根据实施例,为了节省硬件资源,可以将与多个租户相关联的多个文档映射到多租户托管环境中的相同搜索索引。 为了搜索仅在多租户托管环境中与单个租户相关联的文档,预先将租户标识符存储在与单个租户的多个文档相关联的搜索索引中存储的每个密钥。 此外,当一个文档链接到多租户托管环境中的另一个文档时,当源租户标识符与链接的目标租户标识符匹配时,链接被存储在网页图中。 根据实施例,当进行搜索时,仅当链接被存储在网络图中时才解决链接。

    TENANTIZATION OF SEARCH RESULT RANKING
    6.
    发明申请
    TENANTIZATION OF SEARCH RESULT RANKING 有权
    搜索结果排名的评估

    公开(公告)号:US20130110828A1

    公开(公告)日:2013-05-02

    申请号:US13287656

    申请日:2011-11-02

    IPC分类号: G06F17/30

    摘要: This disclosure describes methods and systems for searching documents in a multi-tenant hosting environment. According to embodiments, to conserve hardware resources, a plurality of documents associated with a plurality of tenants may be mapped to the same search index in the multi-tenant hosting environment. In order to search documents associated only with a single tenant in the multi-tenant hosting environment, a tenant identifier is prepended to every key stored in the search index that is associated with the plurality of documents of the single tenant. Moreover, where one document links to another document within the multi-tenant hosting environment, the link is stored in a web graph when a source tenant identifier matches a target tenant identifier for the link. According to embodiments, when conducting a search, the link is resolved only if the link is stored in the web graph.

    摘要翻译: 本公开描述了用于在多租户托管环境中搜索文档的方法和系统。 根据实施例,为了节省硬件资源,可以将与多个租户相关联的多个文档映射到多租户托管环境中的相同搜索索引。 为了搜索仅在多租户托管环境中与单个租户相关联的文档,预先将租户标识符存储在与单个租户的多个文档相关联的搜索索引中存储的每个密钥。 此外,当一个文档链接到多租户托管环境中的另一个文档时,当源租户标识符与链接的目标租户标识符匹配时,链接被存储在网页图中。 根据实施例,当进行搜索时,仅当链接被存储在网络图中时才解决链接。

    Method and system for generating a document summary
    7.
    发明申请
    Method and system for generating a document summary 审中-公开
    用于生成文档摘要的方法和系统

    公开(公告)号:US20060200464A1

    公开(公告)日:2006-09-07

    申请号:US11072734

    申请日:2005-03-03

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F16/338 G06F16/345

    摘要: A text document is segmented into word and sentence information when the document is first presented and indexed. A memory stream is generated for the document. The memory stream includes document title information, word offsets, sentence offsets, the alternate list, and the contents of the document. The memory stream is used to determine which sentences in the document include query terms. The sentences that include query terms are ranked according to a ranking algorithm. The ranking algorithm determines which sentences include the highest number of query terms and the number of occurrences of the query terms in each sentence. A predetermined number of sentences that together contain as many query terms as possible are selected such that the sentences that are most representative of the document with respect to the query are included in the summary. The summary is generated at query time by concatenating the selected sentences with the query terms highlighted.

    摘要翻译: 当文档首次呈现和索引时,文本文档被分割成单词和句子信息。 为文档生成内存流。 存储器流包括文档标题信息,字偏移,句子偏移,备用列表和文档的内容。 内存流用于确定文档中包含查询条款的哪些句子。 根据排序算法对包含查询项的句子进行排序。 排序算法确定哪个句子包括查询词的最高数目和每个句子中查询词的出现次数。 选择一起包含尽可能多的查询词语的预定数量的句子,使得相对于查询最有代表文档的句子被包括在摘要中。 通过将所选择的句子与突出显示的查询字词相连,在查询时生成摘要。

    Search results ranking using editing distance and document information
    8.
    发明授权
    Search results ranking using editing distance and document information 有权
    使用编辑距离和文档信息搜索结果排名

    公开(公告)号:US08812493B2

    公开(公告)日:2014-08-19

    申请号:US12101951

    申请日:2008-04-11

    IPC分类号: G06F7/00

    CPC分类号: G06F17/2211 G06F17/30864

    摘要: Architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string. The edit distance is employed in determining relevance of the document as part of result ranking by detecting near-matches of a whole query or part of the query. The edit distance evaluates how close the query string is to a given data stream that includes document information such as TAUC (title, anchor text, URL, clicks) information, etc. The architecture includes the index-time splitting of compound terms in the URL to allow the more effective discovery of query terms. Additionally, index-time filtering of anchor text is utilized to find the top N anchors of one or more of the document results. The TAUC information can be input to a neural network (e.g., 2-layer) to improve relevance metrics for ranking the search results.

    摘要翻译: 用于基于查询字符串从作为搜索结果接收的文档提取文档信息的结构,以及计算数据串和查询字符串之间的编辑距离。 编辑距离用于通过检测整个查询或部分查询的近似匹配来确定文档作为结果排名的一部分的相关性。 编辑距离评估查询字符串与包含诸如TAUC(标题,锚文本,URL,点击)信息等文档信息的给定数据流的距离。该体系结构包括索引时间分割URL中的复合术语 以便更有效地发现查询条款。 另外,使用锚文本的索引时间过滤来查找一个或多个文档结果的前N个锚点。 可以将TAUC信息输入到神经网络(例如,2层),以改进用于对搜索结果排序的相关性度量。

    Techniques to perform relative ranking for search results
    9.
    发明授权
    Techniques to perform relative ranking for search results 有权
    执行搜索结果相对排名的技术

    公开(公告)号:US08266144B2

    公开(公告)日:2012-09-11

    申请号:US13175043

    申请日:2011-07-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3053

    摘要: Techniques to perform relative ranking for search results are described. An apparatus may include an enhanced search component operative to receive a search query and provide ranked search results responsive to the search query. The enhanced search component may comprise a resource search module operative to search for resources using multiple search terms from the search query, and output a set of resources having some or all of the search terms. The enhanced search component may also comprise a proximity generation module communicatively coupled to the resource search module, the proximity generation module operative to receive the set of resources, retrieve search term position information for each resource, and generate a proximity feature value based on the search term position information. The enhanced search component may further comprise a resource ranking module communicatively coupled to the resource search module and the proximity generation module, the resource ranking module to receive the proximity feature values, and rank the resources based in part on the proximity feature values. Other embodiments are described and claimed.

    摘要翻译: 描述了对搜索结果执行相对排名的技术。 装置可以包括增强的搜索组件,其操作以接收搜索查询并且响应于搜索查询提供排名的搜索结果。 增强搜索组件可以包括资源搜索模块,其可操作以使用来自搜索查询的多个搜索项来搜索资源,并且输出具有部分或全部搜索项的一组资源。 增强搜索组件还可以包括通信地耦合到资源搜索模块的邻近生成模块,用于接收资源集合的邻近生成模块,检索每个资源的搜索项位置信息,以及基于搜索生成接近特征值 期限位置信息。 增强搜索组件还可以包括资源排序模块,其通信地耦合到资源搜索模块和邻近生成模块,用于接收邻近特征值的资源排名模块,以及部分地基于邻近特征值对资源进行排名。 描述和要求保护其他实施例。

    Techniques to perform relative ranking for search results
    10.
    发明授权
    Techniques to perform relative ranking for search results 有权
    执行搜索结果相对排名的技术

    公开(公告)号:US07974974B2

    公开(公告)日:2011-07-05

    申请号:US12051847

    申请日:2008-03-20

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3053

    摘要: Techniques to perform relative ranking for search results are described. An apparatus may include an enhanced search component operative to receive a search query and provide ranked search results responsive to the search query. The enhanced search component may comprise a resource search module operative to search for resources using multiple search terms from the search query, and output a set of resources having some or all of the search terms. The enhanced search component may also comprise a proximity generation module communicatively coupled to the resource search module, the proximity generation module operative to receive the set of resources, retrieve search term position information for each resource, and generate a proximity feature value based on the search term position information. The enhanced search component may further comprise a resource ranking module communicatively coupled to the resource search module and the proximity generation module, the resource ranking module to receive the proximity feature values, and rank the resources based in part on the proximity feature values. Other embodiments are described and claimed.

    摘要翻译: 描述了对搜索结果执行相对排名的技术。 装置可以包括增强的搜索组件,其操作以接收搜索查询并且响应于搜索查询提供排名的搜索结果。 增强搜索组件可以包括资源搜索模块,其可操作以使用来自搜索查询的多个搜索项来搜索资源,并且输出具有部分或全部搜索项的一组资源。 增强搜索组件还可以包括通信地耦合到资源搜索模块的邻近生成模块,用于接收资源集合的邻近生成模块,检索每个资源的搜索项位置信息,以及基于搜索生成接近特征值 期限位置信息。 增强搜索组件还可以包括资源排序模块,其通信地耦合到资源搜索模块和邻近生成模块,用于接收邻近特征值的资源排名模块,以及部分地基于邻近特征值对资源进行排名。 描述和要求保护其他实施例。