Query intent in information retrieval
    1.
    发明授权
    Query intent in information retrieval 有权
    查询信息检索意图

    公开(公告)号:US08380723B2

    公开(公告)日:2013-02-19

    申请号:US12784869

    申请日:2010-05-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06Q30/00

    摘要: Inferring query intent in information retrieval is described. In an example reformulations of an initial query by a user are used to create a query neighborhood. In the example, the query neighborhood is used to identify a set of possibly related queries. First and higher order reformulations of the initial query may be used to expand the query neighborhood. In an example precision can be improved by reducing the query neighborhood to more closely related queries for example, two queries can be connected if they are often clicked for the same document. In an example two queries can be connected using a random walk and all pairs of queries that are not connected by a random walk of less than a fixed threshold are removed. The connected queries can be used to form clusters and weights can be applied in order to determine the most likely related queries.

    摘要翻译: 描述信息检索中的查询意图。 在一个示例中,用户初始查询的重新组合用于创建查询邻域。 在该示例中,查询邻域用于标识一组可能相关的查询。 可以使用初始查询的第一和更高阶重新格式来扩展查询邻域。 在一个示例中,可以通过将查询邻域减少到更密切相关的查询来提高精度,例如,如果经常为同一文档点击,则可以连接两个查询。 在一个示例中,可以使用随机游走连接两个查询,并且去除不通过小于固定阈值的随机游走连接的所有查询对。 连接的查询可以用于形成群集,并且可以应用权重以便确定最可能的相关查询。

    Query Intent in Information Retrieval
    2.
    发明申请
    Query Intent in Information Retrieval 有权
    信息检索中的查询意图

    公开(公告)号:US20110289063A1

    公开(公告)日:2011-11-24

    申请号:US12784869

    申请日:2010-05-21

    IPC分类号: G06F17/30 G06F17/21

    CPC分类号: G06F17/30864 G06Q30/00

    摘要: Inferring query intent in information retrieval is described. In an example reformulations of an initial query by a user are used to create a query neighborhood. In the example, the query neighborhood is used to identify a set of possibly related queries. First and higher order reformulations of the initial query may be used to expand the query neighborhood. In an example precision can be improved by reducing the query neighborhood to more closely related queries for example, two queries can be connected if they are often clicked for the same document. In an example two queries can be connected using a random walk and all pairs of queries that are not connected by a random walk of less than a fixed threshold are removed. The connected queries can be used to form clusters and weights can be applied in order to determine the most likely related queries.

    摘要翻译: 描述信息检索中的查询意图。 在一个示例中,用户初始查询的重新组合用于创建查询邻域。 在该示例中,查询邻域用于标识一组可能相关的查询。 可以使用初始查询的第一和更高阶重新格式来扩展查询邻域。 在一个示例中,可以通过将查询邻域减少到更密切相关的查询来提高精度,例如,如果经常为同一文档点击,则可以连接两个查询。 在一个示例中,可以使用随机游走连接两个查询,并且去除不通过小于固定阈值的随机游走连接的所有查询对。 连接的查询可以用于形成群集,并且可以应用权重以便确定最可能的相关查询。

    Information retrieval system
    3.
    发明授权
    Information retrieval system 有权
    信息检索系统

    公开(公告)号:US08037043B2

    公开(公告)日:2011-10-11

    申请号:US12207315

    申请日:2008-09-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675 G06F17/30864

    摘要: An information retrieval system is described for retrieving a list of documents such as web pages or other items from a document index in response to a user query. In an embodiment a prediction engine is used to predict both explicit relevance information such as judgment labels and implicit relevance information such as click data. In an embodiment the predicted relevance information is applied to a stored utility function that describes user satisfaction with a search session. This produces utility scores for proposed lists of documents. Using the utility scores one of the lists of documents is selected. In this way different sources of relevance information are combined into a single information retrieval system in a principled and effective manner which gives improved performance.

    摘要翻译: 描述了用于响应于用户查询从文档索引检索诸如网页或其他项目的文档列表的信息检索系统。 在一个实施例中,预测引擎用于预测诸如判断标签的显式相关性信息和诸如点击数据的隐含相关性信息。 在一个实施例中,预测的相关性信息被应用于描述用户对搜索会话的满意度的存储效用函数。 这将为拟议的文件清单产生效用分数。 使用实用程序得分选择文档列表之一。 以这种方式,不同的相关信息来源以原则和有效的方式组合成单个信息检索系统,从而提高了性能。

    Information Retrieval System
    4.
    发明申请
    Information Retrieval System 有权
    信息检索系统

    公开(公告)号:US20100076949A1

    公开(公告)日:2010-03-25

    申请号:US12207315

    申请日:2008-09-09

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30675 G06F17/30864

    摘要: An information retrieval system is described for retrieving a list of documents such as web pages or other items from a document index in response to a user query. In an embodiment a prediction engine is used to predict both explicit relevance information such as judgment labels and implicit relevance information such as click data. In an embodiment the predicted relevance information is applied to a stored utility function that describes user satisfaction with a search session. This produces utility scores for proposed lists of documents. Using the utility scores one of the lists of documents is selected. In this way different sources of relevance information are combined into a single information retrieval system in a principled and effective manner which gives improved performance.

    摘要翻译: 描述了用于响应于用户查询从文档索引检索诸如网页或其他项目的文档列表的信息检索系统。 在一个实施例中,预测引擎用于预测诸如判断标签的显式相关性信息和诸如点击数据的隐含相关性信息。 在一个实施例中,预测的相关性信息被应用于描述用户对搜索会话的满意度的存储效用函数。 这将为拟议的文件清单产生效用分数。 使用实用程序得分选择文档列表之一。 以这种方式,不同的相关信息来源以原则和有效的方式组合成单个信息检索系统,从而提高了性能。

    Digital ink labeling
    5.
    发明申请

    公开(公告)号:US20060098871A1

    公开(公告)日:2006-05-11

    申请号:US11256263

    申请日:2005-10-21

    申请人: Martin Szummer

    发明人: Martin Szummer

    IPC分类号: G06K9/34

    摘要: Digital ink strokes may be fragmented to form a training data set. A neighborhood graph may be formed as a plurality of connected nodes. Relevant features of the training data may be determined in each fragment such as local site features, interaction features, and/or part-label interaction features. Using a conditional random field which may include a hidden random field modeling parameters may be developed to provide a training model to determine a posterior probability of the labels given observed data. In this manner, the training model may be used to predict a label for an observed ink stroke. The modeling parameters may be learned from only a portion of the set of ink strokes in an unsupervised way. For example, many compound objects may include compositional parts. In some cases, appropriate compositional parts may be discovered or inferred during training of the model based on the training data.

    Content-based information retrieval
    6.
    发明授权
    Content-based information retrieval 有权
    基于内容的信息检索

    公开(公告)号:US08346800B2

    公开(公告)日:2013-01-01

    申请号:US12417511

    申请日:2009-04-02

    IPC分类号: G06F17/30

    摘要: Content-based information retrieval is described. In an example, a query item such as an image, document, email or other item is presented and items with similar content are retrieved from a database of items. In an example, each time a query is presented, a classifier is formed based on that query and using a training set of items. For example, the classifier is formed in real-time and is formed in such a way that a limit on the proportion of the items in the database that will be retrieved is set. In an embodiment, the query item is analyzed to identify tokens in that item and subsets of those tokens are selected to form the classifier. For example, the subsets of tokens are combined using Boolean operators in a manner which is efficient for searching on particular types of database.

    摘要翻译: 描述基于内容的信息检索。 在一个示例中,呈现诸如图像,文档,电子邮件或其他项目的查询项目,并且从项目的数据库检索具有相似内容的项目。 在一个示例中,每次呈现查询时,基于该查询并使用项目的训练集形成分类器。 例如,分类器是实时形成的,并且以这样的方式形成:设置将要检索的数据库中的项目的比例的限制。 在一个实施例中,分析查询项目以识别该项目中的令牌,并且选择那些令牌的子集以形成分类器。 例如,使用布尔运算符组合令牌的子集,其方法对于在特定类型的数据库上进行搜索是有效的。

    Digital ink labeling
    7.
    发明授权
    Digital ink labeling 失效
    数字墨水标签

    公开(公告)号:US07512273B2

    公开(公告)日:2009-03-31

    申请号:US11256263

    申请日:2005-10-21

    申请人: Martin Szummer

    发明人: Martin Szummer

    IPC分类号: G06K9/00

    摘要: Digital ink strokes may be fragmented to form a training data set. A neighborhood graph may be formed as a plurality of connected nodes. Relevant features of the training data may be determined in each fragment such as local site features, interaction features, and/or part-label interaction features. Using a conditional random field which may include a hidden random field modeling parameters may be developed to provide a training model to determine a posterior probability of the labels given observed data. In this manner, the training model may be used to predict a label for an observed ink stroke. The modeling parameters may be learned from only a portion of the set of ink strokes in an unsupervised way. For example, many compound objects may include compositional parts. In some cases, appropriate compositional parts may be discovered or inferred during training of the model based on the training data.

    摘要翻译: 数字墨迹可以被分段以形成训练数据集。 邻域图可以形成为多个连接的节点。 可以在每个片段中确定训练数据的相关特征,例如本地站点特征,交互特征和/或部分标签交互特征。 可以开发使用可包括隐藏的随机场建模参数的条件随机场来提供训练模型以确定给定观测数据的标签的后验概率。 以这种方式,训练模型可以用于预测观察到的油墨行程的标签。 建模参数可以以无监督的方式从仅一组墨水笔画中获知。 例如,许多复合对象可以包括组成部分。 在某些情况下,可以根据训练数据在训练模型期间发现或推断适当的组成部分。

    Bayesian conditional random fields
    8.
    发明申请
    Bayesian conditional random fields 审中-公开
    贝叶斯条件随机场

    公开(公告)号:US20060115145A1

    公开(公告)日:2006-06-01

    申请号:US10999880

    申请日:2004-11-30

    IPC分类号: G06K9/62

    摘要: A Bayesian approach to training in conditional random fields takes a prior distribution over the modeling parameters of interest. These prior distributions may be used to generate an approximate form of a posterior distribution over the parameters, which may be trained with example or training data. Automatic relevance determination (ARD) may be integrated in the training to automatically select relevant features of the training data. From the trained posterior distribution of the parameters, a posterior distribution over the parameters based on the training data and the prior distributions over parameters may be approximated to form a training model. Using the developed training model, a given image may be evaluated by integrating over the posterior distribution over parameters to obtain a marginal probability distribution over the labels given that observational data.

    摘要翻译: 贝叶斯方法在有条件的随机场训练中先前分配了所关注的建模参数。 这些先前的分布可以用于生成可以用示例或训练数据训练的参数上的后验分布的近似形式。 自动相关性确定(ARD)可以集成在训练中,以自动选择训练数据的相关特征。 从经过训练的后验分布参数,基于训练数据和先验参数分布的参数后验分布可以近似形成训练模型。 使用开发的训练模型,可以通过对参数上的后验分布进行积分来评估给定图像,以便在给定观测数据的情况下获得标签上的边际概率分布。

    Information retrieval using query-document pair information
    9.
    发明授权
    Information retrieval using query-document pair information 有权
    使用查询文件对信息进行信息检索

    公开(公告)号:US07877385B2

    公开(公告)日:2011-01-25

    申请号:US11859604

    申请日:2007-09-21

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Information retrieval using query-document pair information is described. In an embodiment, a click record is accessed having information about queries and documents where user clicks have been observed for query-document pairs. A click graph is either formed or accessed. This has nodes connected by edges, each node representing any of a document and a query and each edge representing at least one observed click. Given at least one first node in the click graph, a similarity measure is determined between that first node and each of one or more second nodes. The second nodes are then ranked on the basis of the similarity measure results and the ranking is used to retrieve information from the click record.

    摘要翻译: 描述使用查询文档对信息的信息检索。 在一个实施例中,访问具有关于查询文档对的用户点击的查询和文档的信息的点击记录。 点击图形成或访问。 它具有通过边缘连接的节点,每个节点表示文档和查询中的任何一个,每个边缘表示至少一个观察到的点击。 给定点击图中的至少一个第一节点,在该第一节点和一个或多个第二节点中的每一个之间确定相似性度量。 然后,基于相似性度量结果对第二节点进行排名,并且使用排名从点击记录中检索信息。

    Content-Based Information Retrieval
    10.
    发明申请
    Content-Based Information Retrieval 有权
    基于内容的信息检索

    公开(公告)号:US20100257202A1

    公开(公告)日:2010-10-07

    申请号:US12417511

    申请日:2009-04-02

    IPC分类号: G06F17/30

    摘要: Content-based information retrieval is described. In an example, a query item such as an image, document, email or other item is presented and items with similar content are retrieved from a database of items. In an example, each time a query is presented, a classifier is formed based on that query and using a training set of items. For example, the classifier is formed in real-time and is formed in such a way that a limit on the proportion of the items in the database that will be retrieved is set. In an embodiment, the query item is analyzed to identify tokens in that item and subsets of those tokens are selected to form the classifier. For example, the subsets of tokens are combined using Boolean operators in a manner which is efficient for searching on particular types of database.

    摘要翻译: 描述基于内容的信息检索。 在一个示例中,呈现诸如图像,文档,电子邮件或其他项目的查询项目,并且从项目的数据库检索具有相似内容的项目。 在一个示例中,每次呈现查询时,基于该查询并使用项目的训练集形成分类器。 例如,分类器是实时形成的,并且以这样的方式形成:设置将要检索的数据库中的项目的比例的限制。 在一个实施例中,分析查询项目以识别该项目中的令牌,并且选择那些令牌的子集以形成分类器。 例如,使用布尔运算符组合令牌的子集,其方法对于在特定类型的数据库上进行搜索是有效的。