DIRECTLY OPTIMIZING EVALUATION MEASURES IN LEARNING TO RANK
    1.
    发明申请
    DIRECTLY OPTIMIZING EVALUATION MEASURES IN LEARNING TO RANK 有权
    直接优化评估评估方法

    公开(公告)号:US20100082606A1

    公开(公告)日:2010-04-01

    申请号:US12237293

    申请日:2008-09-24

    IPC分类号: G06F17/30 G06F17/10

    CPC分类号: G06F17/30687 G06F17/30867

    摘要: The present invention provides methods for improving a ranking model. In one embodiment, a method includes the step of obtaining queries, documents, and document labels. The process then initializes active sets using the document labels, wherein two active sets are established for each query, a perfect active set and an imperfect active set. Then, the process optimizes an empirical loss function by the use of the first and second active set, whereby parameters of the ranking model are modified in accordance to the empirical loss function. The method then updates the active sets with additional ranking data, wherein the updates are configured to work in conjunction with the optimized loss function and modified ranking model. The recalculated active sets provide an indication for ranking the documents in a way that is more consistent with the document metadata.

    摘要翻译: 本发明提供了改进排名模型的方法。 在一个实施例中,一种方法包括获得查询,文档和文档标签的步骤。 然后,该过程使用文档标签来初始化活动集合,其中为每个查询建立两个活动集合,完美的活动集合和不完全的活动集合。 然后,该过程通过使用第一和第二活动集来优化经验损失函数,由此根据经验损失函数修改排名模型的参数。 然后,该方法用附加排名数据更新活动集合,其中更新被配置为与优化的损失函数和修改的排名模型一起工作。 重新计算的活动集提供了以与文档元数据更一致的方式对文档进行排名的指示。

    Learning a document ranking using a loss function with a rank pair or a query parameter
    2.
    发明授权
    Learning a document ranking using a loss function with a rank pair or a query parameter 有权
    使用具有排名对或查询参数的损失函数学习文档排名

    公开(公告)号:US07593934B2

    公开(公告)日:2009-09-22

    申请号:US11460838

    申请日:2006-07-28

    IPC分类号: G06F7/00 G06F17/30 G06F15/00

    摘要: A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.

    摘要翻译: 提供了一种用于生成用于将文档与查询的相关性排序的排序函数的方法和系统。 排名系统从包括查询,结果文档以及每个文档与其查询的相关性的训练数据中学习排名函数。 排名系统使用训练数据通过对相关文件的不正确排名加权比不相关文件的不正确排名更多地学习排名功能,以便更加重视正确排列相关文件。 排序系统还可以通过将每个查询的贡献归一化到排序函数来学习使用训练数据的排序函数,使得它独立于每个查询的相关文档的数量。

    Directly optimizing evaluation measures in learning to rank
    3.
    发明授权
    Directly optimizing evaluation measures in learning to rank 有权
    直接优化学习排名评估指标

    公开(公告)号:US08478748B2

    公开(公告)日:2013-07-02

    申请号:US12237293

    申请日:2008-09-24

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30687 G06F17/30867

    摘要: The present invention provides methods for improving a ranking model. In one embodiment, a method includes the step of obtaining queries, documents, and document labels. The process then initializes active sets using the document labels, wherein two active sets are established for each query, a perfect active set and an imperfect active set. Then, the process optimizes an empirical loss function by the use of the first and second active set, whereby parameters of the ranking model are modified in accordance to the empirical loss function. The method then updates the active sets with additional ranking data, wherein the updates are configured to work in conjunction with the optimized loss function and modified ranking model. The recalculated active sets provide an indication for ranking the documents in a way that is more consistent with the document metadata.

    摘要翻译: 本发明提供了改进排名模型的方法。 在一个实施例中,一种方法包括获得查询,文档和文档标签的步骤。 然后,该过程使用文档标签来初始化活动集合,其中为每个查询建立两个活动集合,完美的活动集合和不完全的活动集合。 然后,该过程通过使用第一和第二活动集来优化经验损失函数,由此根据经验损失函数修改排名模型的参数。 然后,该方法用附加排名数据更新活动集合,其中更新被配置为与优化的损失函数和修改的排名模型一起工作。 重新计算的活动集提供了以与文档元数据更一致的方式对文档进行排名的指示。

    LEARNING A DOCUMENT RANKING USING A LOSS FUNCTION WITH A RANK PAIR OR A QUERY PARAMETER
    4.
    发明申请
    LEARNING A DOCUMENT RANKING USING A LOSS FUNCTION WITH A RANK PAIR OR A QUERY PARAMETER 有权
    学习一个文件排序使用一个失败的功能与排名对或一个查询参数

    公开(公告)号:US20080027925A1

    公开(公告)日:2008-01-31

    申请号:US11460838

    申请日:2006-07-28

    IPC分类号: G06F17/30

    摘要: A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.

    摘要翻译: 提供了一种用于生成用于将文档与查询的相关性排序的排序函数的方法和系统。 排名系统从包括查询,结果文档以及每个文档与其查询的相关性的训练数据中学习排名函数。 排名系统使用训练数据通过对相关文件的不正确排名加权比不相关文件的不正确排名更多地学习排名功能,以便更加重视正确排列相关文件。 排序系统还可以通过将每个查询的贡献归一化到排序函数来学习使用训练数据的排序函数,使得它独立于每个查询的相关文档的数量。

    Calculating a webpage importance from a web browsing graph
    5.
    发明授权
    Calculating a webpage importance from a web browsing graph 有权
    从网页浏览图计算网页重要性

    公开(公告)号:US08368698B2

    公开(公告)日:2013-02-05

    申请号:US12236516

    申请日:2008-09-24

    IPC分类号: G06T11/20 G06F3/00

    CPC分类号: G06F17/30882 G06F17/30864

    摘要: Method for creating a graph representing web browsing behavior, including receiving web browsing behavior data from one or more web browsers; adding a node on the graph for each web page listed in the web browsing behavior data; adding a first link connecting two or more nodes on the graph, wherein the first link representing a hyperlink for accessing a webpage; calculating an amount of time in which each web page is being accessed; determining a number of units of time in the calculated amount of time; adding one or more virtual nodes to the graph based on the number of units of time; and adding a second link connecting two or more virtual nodes on the graph, wherein the second link representing a virtual hyperlink for accessing a webpage.

    摘要翻译: 用于创建表示网页浏览行为的图形的方法,包括从一个或多个网络浏览器接收网页浏览行为数据; 在网络浏览行为数据中列出的每个网页的图形上添加一个节点; 添加连接图上的两个或多个节点的第一链接,其中第一链接表示用于访问网页的超链接; 计算每个网页被访问的时间量; 在计算的时间量中确定时间单位的数量; 基于时间单位的数量向图中添加一个或多个虚拟节点; 以及添加连接所述图上的两个或多个虚拟节点的第二链接,其中所述第二链接表示用于访问网页的虚拟超链接。

    Optimizing ranking of documents using continuous conditional random fields
    6.
    发明授权
    Optimizing ranking of documents using continuous conditional random fields 有权
    使用连续条件随机字段优化文档的排名

    公开(公告)号:US08195669B2

    公开(公告)日:2012-06-05

    申请号:US12235355

    申请日:2008-09-22

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30675

    摘要: The present invention provides an improved method for ranking documents using a ranking model. One embodiment employs Continuous Conditional Random Fields (CRF) as a model, which is a conditional probability distribution representing a mapping relationship from retrieved documents to their ranking scores. The model can naturally utilize features of the content information of documents as well as the relation information between documents for global ranking. The present invention also provides a learning algorithm for creating Continuous CRF. Also provided, the invention introduces Pseudo Relevance Feedback and Topic Distillation.

    摘要翻译: 本发明提供了一种使用排名模型对文档进行排序的改进方法。 一个实施例采用连续条件随机场(CRF)作为模型,其是表示从检索到的文档到其排名得分的映射关系的条件概率分布。 该模型可以自然地利用文档的内容信息的特征以及用于全球排名的文档之间的关系信息。 本发明还提供了一种用于创建连续CRF的学习算法。 本发明还提供了伪相关反馈和主题蒸馏。

    APPROXIMATION FRAMEWORK FOR DIRECT OPTIMIZATION OF INFORMATION RETRIEVAL MEASURES
    7.
    发明申请
    APPROXIMATION FRAMEWORK FOR DIRECT OPTIMIZATION OF INFORMATION RETRIEVAL MEASURES 审中-公开
    用于直接优化信息检索措施的近似框架

    公开(公告)号:US20110302193A1

    公开(公告)日:2011-12-08

    申请号:US12795628

    申请日:2010-06-07

    IPC分类号: G06F15/18 G06F17/30

    CPC分类号: G06F16/338

    摘要: A “Ranking Optimizer,” provides a framework for directly optimizing conventional information retrieval (IR) measures for use in ranking, search, and recommendation type applications. In general, the Ranking Optimizer first reformats any conventional position based IR measure from a conventional “indexing by position” process to an “indexing by documents” process to create a newly formulated IR measure which contains a position function, and optionally, a truncation function. Both of these functions are non-continuous and non-differentiable. Therefore, the Ranking Optimizer approximates the position function by using a smooth function of ranking scores, and, if used, approximates the optional truncation function with a smooth function of positions of documents. Finally, the Ranking Optimizer optimizes the approximated functions to provide a highly accurate surrogate function for use as a surrogate IR measure.

    摘要翻译: “排名优化器”提供了直接优化用于排名,搜索和推荐类型应用程序的常规信息检索(IR)度量的框架。 一般来说,排名优化器首先将传统的基于位置的IR测量重新格式化为常规的“通过位置索引”过程到“通过文档索引”过程来创建新的IR度量,其包含位置函数和可选的截断函数 。 这两个功能都是非连续的,不可区分的。 因此,排名优化器通过使用排序分数的平滑函数来近似位置函数,如果使用,则使用文档位置的平滑函数近似可选的截断函数。 最后,排名优化器优化了近似函数,以提供高精度的代理功能,用作代理IR度量。

    Anti-spam tool for browser
    8.
    发明授权
    Anti-spam tool for browser 有权
    用于浏览器的反垃圾邮件工具

    公开(公告)号:US07860971B2

    公开(公告)日:2010-12-28

    申请号:US12035124

    申请日:2008-02-21

    IPC分类号: G06F15/16

    CPC分类号: G06F17/30899 G06F21/50

    摘要: An anti-spam tool works with a web browser to detect spam webpages locally on a client machine. The anti-spam tool can be implemented either as a plug-in module or an integral part of the browser, and manifested as a toolbar. The tool can perform an anti-spam action whenever a webpage is accessed through the browser, and does not require direct involvement of a search engine. A spam detection module installed on the computing device determines whether a webpage being accessed or whether a link contained in the webpage being accessed is spam, by comparing the URL of the webpage or the link with a spam list. The spam list can be downloaded from a remote search engine server, stored locally and updated from time to time. A two-level indexing technique is also introduced to improve the efficiency of the anti-spam tool's use of the spam list.

    摘要翻译: 反垃圾邮件工具与网络浏览器配合使用,可以在客户机上本地检测垃圾邮件网页。 反垃圾邮件工具可以作为插件模块或浏览器的组成部分来实现,并且表现为工具栏。 每当通过浏览器访问网页时,该工具都可以执行反垃圾邮件操作,并且不需要直接参与搜索引擎。 安装在计算设备上的垃圾邮件检测模块通过将网页或链接的URL与垃圾邮件列表进行比较来确定正在访问的网页是否被访问的网页中包含的链接是垃圾邮件。 垃圾邮件列表可以从远程搜索引擎服务器下载,本地存储和不时更新。 还引入了两级索引技术,以提高反垃圾邮件工具使用垃圾邮件列表的效率。

    Listwise ranking
    9.
    发明授权
    Listwise ranking 失效
    列表排名

    公开(公告)号:US07734633B2

    公开(公告)日:2010-06-08

    申请号:US11874813

    申请日:2007-10-18

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Procedures for learning and ranking items in a listwise manner are discussed. A listwise methodology may consider a ranked list, of individual items, as a specific permutation of the items being ranked. In implementations, a listwise loss function may be used in ranking items. A listwise loss function may be a metric which reflects the departure or disorder from an exemplary ranking for one or more sample listwise rankings used in learning. In this manner, the loss function may approximate the exemplary ranking for the plurality of items being ranked.

    摘要翻译: 讨论了以列表方式学习和排序项目的程序。 列表方法可以将个别项目的排名列表视为被排序的项目的具体置换。 在实现中,可以在排序项中使用列表丢失函数。 列表损失函数可以是反映学习中使用的一个或多个样本列表排序的示例性排名的偏离或混乱的度量。 以这种方式,损失函数可以近似排列的多个项目的示例性排名。

    OPTIMIZING RANKING OF DOCUMENTS USING CONTINUOUS CONDITIONAL RANDOM FIELDS
    10.
    发明申请
    OPTIMIZING RANKING OF DOCUMENTS USING CONTINUOUS CONDITIONAL RANDOM FIELDS 有权
    使用连续条件随机字段优化文档排名

    公开(公告)号:US20100082613A1

    公开(公告)日:2010-04-01

    申请号:US12235355

    申请日:2008-09-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675

    摘要: The present invention provides an improved method for ranking documents using a ranking model. One embodiment employs Continuous Conditional Random Fields (CRF) as a model, which is a conditional probability distribution representing a mapping relationship from retrieved documents to their ranking scores. The model can naturally utilize features of the content information of documents as well as the relation information between documents for global ranking. The present invention also provides a learning algorithm for creating Continuous CRF. Also provided, the invention introduces Pseudo Relevance Feedback and Topic Distillation.

    摘要翻译: 本发明提供了一种使用排名模型对文档进行排序的改进方法。 一个实施例采用连续条件随机场(CRF)作为模型,其是表示从检索到的文档到其排名得分的映射关系的条件概率分布。 该模型可以自然地利用文档的内容信息的特征以及用于全球排名的文档之间的关系信息。 本发明还提供了一种用于创建连续CRF的学习算法。 本发明还提供了伪相关反馈和主题蒸馏。