-
公开(公告)号:US09495462B2
公开(公告)日:2016-11-15
申请号:US13360536
申请日:2012-01-27
申请人: Victor Poznanski , Oivind Wang , Fredrik Holm , Nicolai Bodd , Vladimir Tankovich , Dmitriy Meyerzon
发明人: Victor Poznanski , Oivind Wang , Fredrik Holm , Nicolai Bodd , Vladimir Tankovich , Dmitriy Meyerzon
CPC分类号: G06F17/30867
摘要: Search results obtained from a ranking model are re-ranked based on user-configured ranking rules. For example, a user may desire to: place certain search results at a top/bottom of a ranking of search results; remove some search results; and/or adjust a ranking of some of the search results. A Graphical User Interface (GUI) allows a user to configure the ranking rules (e.g. enter key/value restrictions and to set a boost value) and to preview an application of one or more of the ranking rules. Query language operators that follow a standard operator syntax are created based on the inputs (e.g. a ranking query operator is created that may include multiple user supplied parameters). The user may also specify a portion of the results from which statistics (e.g. standard deviation, average score) are calculated. For example, a user may specify to calculate statistics for the top N number results.
摘要翻译: 从排名模型获得的搜索结果根据用户配置的排名规则进行重新排名。 例如,用户可能希望:将某些搜索结果放置在搜索结果的排名的顶部/底部; 删除一些搜索结果; 和/或调整某些搜索结果的排名。 图形用户界面(GUI)允许用户配置排序规则(例如输入键/值限制并设置升压值)并预览一个或多个排序规则的应用程序。 基于输入(例如创建可能包括多个用户提供的参数的排名查询运算符)创建遵循标准运算符语法的查询语言运算符。 用户还可以指定计算统计数据(例如标准偏差,平均分数)的结果的一部分。 例如,用户可以指定计算前N个结果的统计。
-
公开(公告)号:US20130198174A1
公开(公告)日:2013-08-01
申请号:US13360536
申请日:2012-01-27
申请人: Victor Poznanski , Oivind Wang , Fredrik Holm , Nicolai Bodd , Vladimir Tankovich , Dmitriy Meyerzon
发明人: Victor Poznanski , Oivind Wang , Fredrik Holm , Nicolai Bodd , Vladimir Tankovich , Dmitriy Meyerzon
IPC分类号: G06F17/30
CPC分类号: G06F17/30867
摘要: Search results obtained from a ranking model are re-ranked based on user-configured ranking rules. For example, a user may desire to: place certain search results at a top/bottom of a ranking of search results; remove some search results; and/or adjust a ranking of some of the search results. A Graphical User Interface (GUI) allows a user to configure the ranking rules (e.g. enter key/value restrictions and to set a boost value) and to preview an application of one or more of the ranking rules. Query language operators that follow a standard operator syntax are created based on the inputs (e.g. a ranking query operator is created that may include multiple user supplied parameters). The user may also specify a portion of the results from which statistics (e.g. standard deviation, average score) are calculated. For example, a user may specify to calculate statistics for the top N number results.
摘要翻译: 从排名模型获得的搜索结果根据用户配置的排名规则进行重新排名。 例如,用户可能希望:将某些搜索结果放置在搜索结果的排名的顶部/底部; 删除一些搜索结果; 和/或调整某些搜索结果的排名。 图形用户界面(GUI)允许用户配置排序规则(例如输入键/值限制并设置升压值)并预览一个或多个排序规则的应用程序。 基于输入(例如创建可能包括多个用户提供的参数的排名查询运算符)创建遵循标准运算符语法的查询语言运算符。 用户还可以指定计算统计数据(例如标准偏差,平均分数)的结果的一部分。 例如,用户可以指定计算前N个结果的统计。
-
公开(公告)号:US20110295850A1
公开(公告)日:2011-12-01
申请号:US12791756
申请日:2010-06-01
IPC分类号: G06F17/30
CPC分类号: G06F17/30 , G06F17/00 , G06F17/30657 , G06F17/30864
摘要: Embodiments are directed to ranking search results using a junk profile. For a given corpus of documents, one or more junk profiles may be created and maintained. The junk profile provides reference metrics to represent known junk documents. For example, a junk profile may comprise a dictionary of document data that is automatically inserted into documents created using a particular system or template. A junk profile may also comprise one or more representations (e.g., histograms) of a distribution of a particular junk variable for known junk documents. The junk profile provides a usable representation of known junk documents, and the present systems and methods employ the junk profile to predict the likelihood that documents in the corpus are junk. In embodiments, junk scores are calculated and used to rank such documents higher or lower in response to a search query.
摘要翻译: 实施例涉及使用垃圾简档对搜索结果进行排名。 对于给定的文档语料库,可以创建和维护一个或多个垃圾配置文件。 垃圾配置文件提供参考指标来表示已知的垃圾文档。 例如,垃圾简档可以包括自动插入到使用特定系统或模板创建的文档中的文档数据的字典。 垃圾简档还可以包括用于已知垃圾文档的特定垃圾变量的分布的一个或多个表示(例如直方图)。 垃圾简档提供已知垃圾文档的可用表示,并且本系统和方法使用垃圾简档来预测语料库中的文档是垃圾的可能性。 在实施例中,计算垃圾分数并用于响应于搜索查询而对这些文档进行更高或更低的排序。
-
公开(公告)号:US08738635B2
公开(公告)日:2014-05-27
申请号:US12791756
申请日:2010-06-01
CPC分类号: G06F17/30 , G06F17/00 , G06F17/30657 , G06F17/30864
摘要: Embodiments are directed to ranking search results using a junk profile. For a given corpus of documents, one or more junk profiles may be created and maintained. The junk profile provides reference metrics to represent known junk documents. For example, a junk profile may comprise a dictionary of document data that is automatically inserted into documents created using a particular system or template. A junk profile may also comprise one or more representations (e.g., histograms) of a distribution of a particular junk variable for known junk documents. The junk profile provides a usable representation of known junk documents, and the present systems and methods employ the junk profile to predict the likelihood that documents in the corpus are junk. In embodiments, junk scores are calculated and used to rank such documents higher or lower in response to a search query.
摘要翻译: 实施例涉及使用垃圾简档对搜索结果进行排名。 对于给定的文档语料库,可以创建和维护一个或多个垃圾配置文件。 垃圾配置文件提供参考指标来表示已知的垃圾文档。 例如,垃圾简档可以包括自动插入到使用特定系统或模板创建的文档中的文档数据的字典。 垃圾简档还可以包括用于已知垃圾文档的特定垃圾变量的分布的一个或多个表示(例如直方图)。 垃圾简档提供已知垃圾文档的可用表示,并且本系统和方法使用垃圾简档来预测语料库中的文档是垃圾的可能性。 在实施例中,计算垃圾分数并用于响应于搜索查询而对这些文档进行更高或更低的排序。
-
公开(公告)号:US20130110815A1
公开(公告)日:2013-05-02
申请号:US13283632
申请日:2011-10-28
IPC分类号: G06F17/30
CPC分类号: G06F16/9566 , G06F16/9535
摘要: Concepts and technologies are described herein for generating and presenting deep links. In accordance with the concepts and technologies disclosed herein a search engine is configured to generate deep links associated with a site. A site is identified by the search engine and the site is analyzed by the search engine with data relating to searches of and/or usage of the site. The search engine identifies links or other resources contained in, associated with, or referenced by the site, generates deep links corresponding to the resources, and associates the deep links with the site. If a site having indexed deep links is identified in search results, the search engine identifies one or more deep links associated with the site and presents the deep links with the search results to provide a searcher with relevant resources that may not satisfy the search query submitted by the searcher.
摘要翻译: 这里描述了用于生成和呈现深层链接的概念和技术。 根据本文公开的概念和技术,搜索引擎被配置为生成与站点相关联的深层链接。 搜索引擎识别出一个网站,并且搜索引擎对该站点进行了与网站搜索和/或使用有关的数据的分析。 搜索引擎识别站点中包含,关联或引用的链接或其他资源,生成对应于资源的深层链接,并将深层链接与站点相关联。 如果在搜索结果中识别出索引了深层链接的网站,则搜索引擎识别与该网站相关联的一个或多个深层链接,并向搜索者呈现与搜索结果的深层链接,以向搜索者提供可能不满足提交的搜索查询的相关资源 由搜索者。
-
公开(公告)号:US08694507B2
公开(公告)日:2014-04-08
申请号:US13287656
申请日:2011-11-02
申请人: Dmitriy Meyerzon , Mihai Petriuc , Nicolai Bodd
发明人: Dmitriy Meyerzon , Mihai Petriuc , Nicolai Bodd
CPC分类号: G06F17/30867 , G06F17/30011 , G06F17/30321 , G06F17/3053 , G06F17/30864
摘要: This disclosure describes methods and systems for searching documents in a multi-tenant hosting environment. According to embodiments, to conserve hardware resources, a plurality of documents associated with a plurality of tenants may be mapped to the same search index in the multi-tenant hosting environment. In order to search documents associated only with a single tenant in the multi-tenant hosting environment, a tenant identifier is prepended to every key stored in the search index that is associated with the plurality of documents of the single tenant. Moreover, where one document links to another document within the multi-tenant hosting environment, the link is stored in a web graph when a source tenant identifier matches a target tenant identifier for the link. According to embodiments, when conducting a search, the link is resolved only if the link is stored in the web graph.
摘要翻译: 本公开描述了用于在多租户托管环境中搜索文档的方法和系统。 根据实施例,为了节省硬件资源,可以将与多个租户相关联的多个文档映射到多租户托管环境中的相同搜索索引。 为了搜索仅在多租户托管环境中与单个租户相关联的文档,预先将租户标识符存储在与单个租户的多个文档相关联的搜索索引中存储的每个密钥。 此外,当一个文档链接到多租户托管环境中的另一个文档时,当源租户标识符与链接的目标租户标识符匹配时,链接被存储在网页图中。 根据实施例,当进行搜索时,仅当链接被存储在网络图中时才解决链接。
-
公开(公告)号:US20130110828A1
公开(公告)日:2013-05-02
申请号:US13287656
申请日:2011-11-02
申请人: Dmitriy Meyerzon , Mihai Petriuc , Nicolai Bodd
发明人: Dmitriy Meyerzon , Mihai Petriuc , Nicolai Bodd
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30011 , G06F17/30321 , G06F17/3053 , G06F17/30864
摘要: This disclosure describes methods and systems for searching documents in a multi-tenant hosting environment. According to embodiments, to conserve hardware resources, a plurality of documents associated with a plurality of tenants may be mapped to the same search index in the multi-tenant hosting environment. In order to search documents associated only with a single tenant in the multi-tenant hosting environment, a tenant identifier is prepended to every key stored in the search index that is associated with the plurality of documents of the single tenant. Moreover, where one document links to another document within the multi-tenant hosting environment, the link is stored in a web graph when a source tenant identifier matches a target tenant identifier for the link. According to embodiments, when conducting a search, the link is resolved only if the link is stored in the web graph.
摘要翻译: 本公开描述了用于在多租户托管环境中搜索文档的方法和系统。 根据实施例,为了节省硬件资源,可以将与多个租户相关联的多个文档映射到多租户托管环境中的相同搜索索引。 为了搜索仅在多租户托管环境中与单个租户相关联的文档,预先将租户标识符存储在与单个租户的多个文档相关联的搜索索引中存储的每个密钥。 此外,当一个文档链接到多租户托管环境中的另一个文档时,当源租户标识符与链接的目标租户标识符匹配时,链接被存储在网页图中。 根据实施例,当进行搜索时,仅当链接被存储在网络图中时才解决链接。
-
8.
公开(公告)号:US08812493B2
公开(公告)日:2014-08-19
申请号:US12101951
申请日:2008-04-11
申请人: Vladimir Tankovich , Hang Li , Dmitriy Meyerzon , Jun Xu
发明人: Vladimir Tankovich , Hang Li , Dmitriy Meyerzon , Jun Xu
IPC分类号: G06F7/00
CPC分类号: G06F17/2211 , G06F17/30864
摘要: Architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string. The edit distance is employed in determining relevance of the document as part of result ranking by detecting near-matches of a whole query or part of the query. The edit distance evaluates how close the query string is to a given data stream that includes document information such as TAUC (title, anchor text, URL, clicks) information, etc. The architecture includes the index-time splitting of compound terms in the URL to allow the more effective discovery of query terms. Additionally, index-time filtering of anchor text is utilized to find the top N anchors of one or more of the document results. The TAUC information can be input to a neural network (e.g., 2-layer) to improve relevance metrics for ranking the search results.
摘要翻译: 用于基于查询字符串从作为搜索结果接收的文档提取文档信息的结构,以及计算数据串和查询字符串之间的编辑距离。 编辑距离用于通过检测整个查询或部分查询的近似匹配来确定文档作为结果排名的一部分的相关性。 编辑距离评估查询字符串与包含诸如TAUC(标题,锚文本,URL,点击)信息等文档信息的给定数据流的距离。该体系结构包括索引时间分割URL中的复合术语 以便更有效地发现查询条款。 另外,使用锚文本的索引时间过滤来查找一个或多个文档结果的前N个锚点。 可以将TAUC信息输入到神经网络(例如,2层),以改进用于对搜索结果排序的相关性度量。
-
公开(公告)号:US08266144B2
公开(公告)日:2012-09-11
申请号:US13175043
申请日:2011-07-01
IPC分类号: G06F17/30
CPC分类号: G06F17/3053
摘要: Techniques to perform relative ranking for search results are described. An apparatus may include an enhanced search component operative to receive a search query and provide ranked search results responsive to the search query. The enhanced search component may comprise a resource search module operative to search for resources using multiple search terms from the search query, and output a set of resources having some or all of the search terms. The enhanced search component may also comprise a proximity generation module communicatively coupled to the resource search module, the proximity generation module operative to receive the set of resources, retrieve search term position information for each resource, and generate a proximity feature value based on the search term position information. The enhanced search component may further comprise a resource ranking module communicatively coupled to the resource search module and the proximity generation module, the resource ranking module to receive the proximity feature values, and rank the resources based in part on the proximity feature values. Other embodiments are described and claimed.
摘要翻译: 描述了对搜索结果执行相对排名的技术。 装置可以包括增强的搜索组件,其操作以接收搜索查询并且响应于搜索查询提供排名的搜索结果。 增强搜索组件可以包括资源搜索模块,其可操作以使用来自搜索查询的多个搜索项来搜索资源,并且输出具有部分或全部搜索项的一组资源。 增强搜索组件还可以包括通信地耦合到资源搜索模块的邻近生成模块,用于接收资源集合的邻近生成模块,检索每个资源的搜索项位置信息,以及基于搜索生成接近特征值 期限位置信息。 增强搜索组件还可以包括资源排序模块,其通信地耦合到资源搜索模块和邻近生成模块,用于接收邻近特征值的资源排名模块,以及部分地基于邻近特征值对资源进行排名。 描述和要求保护其他实施例。
-
公开(公告)号:US07974974B2
公开(公告)日:2011-07-05
申请号:US12051847
申请日:2008-03-20
IPC分类号: G06F17/30
CPC分类号: G06F17/3053
摘要: Techniques to perform relative ranking for search results are described. An apparatus may include an enhanced search component operative to receive a search query and provide ranked search results responsive to the search query. The enhanced search component may comprise a resource search module operative to search for resources using multiple search terms from the search query, and output a set of resources having some or all of the search terms. The enhanced search component may also comprise a proximity generation module communicatively coupled to the resource search module, the proximity generation module operative to receive the set of resources, retrieve search term position information for each resource, and generate a proximity feature value based on the search term position information. The enhanced search component may further comprise a resource ranking module communicatively coupled to the resource search module and the proximity generation module, the resource ranking module to receive the proximity feature values, and rank the resources based in part on the proximity feature values. Other embodiments are described and claimed.
摘要翻译: 描述了对搜索结果执行相对排名的技术。 装置可以包括增强的搜索组件,其操作以接收搜索查询并且响应于搜索查询提供排名的搜索结果。 增强搜索组件可以包括资源搜索模块,其可操作以使用来自搜索查询的多个搜索项来搜索资源,并且输出具有部分或全部搜索项的一组资源。 增强搜索组件还可以包括通信地耦合到资源搜索模块的邻近生成模块,用于接收资源集合的邻近生成模块,检索每个资源的搜索项位置信息,以及基于搜索生成接近特征值 期限位置信息。 增强搜索组件还可以包括资源排序模块,其通信地耦合到资源搜索模块和邻近生成模块,用于接收邻近特征值的资源排名模块,以及部分地基于邻近特征值对资源进行排名。 描述和要求保护其他实施例。
-
-
-
-
-
-
-
-
-