TRAINING A SEARCH RESULT RANKER WITH AUTOMATICALLY-GENERATED SAMPLES
    43.
    发明申请
    TRAINING A SEARCH RESULT RANKER WITH AUTOMATICALLY-GENERATED SAMPLES 有权
    用自动生成样本培养搜索结果排名

    公开(公告)号:US20100082510A1

    公开(公告)日:2010-04-01

    申请号:US12243359

    申请日:2008-10-01

    IPC分类号: G06F15/18 G06F7/06 G06F17/30

    CPC分类号: G06N99/005 G06F17/3053

    摘要: A search result ranker may be trained with automatically-generated samples. In an example embodiment, user interests are inferred from user interactions with search results for a particular query so as to determine respective relevance scores associated with respective query-identifier pairs of the search results. Query-identifier-relevance score triplets are formulated from the respective relevance scores associated with the respective query-identifier pairs. The query-identifier-relevance score triplets are submitted as training samples to a search result ranker. The search result ranker is trained as a learning machine with multiple training samples of the query-identifier-relevance score triplets.

    摘要翻译: 搜索结果筛选器可以用自动生成的样本进行训练。 在一个示例性实施例中,用户兴趣从用户与特定查询的搜索结果的交互推断,以便确定与搜索结果的相应查询 - 标识符对相关联的相应关联度得分。 查询标识符 - 相关性分数三元组由与相应查询 - 标识符对相关联的各个相关性得分制定。 查询标识符 - 相关性分数三元组作为训练样本提交给搜索结果筛选器。 搜索结果筛选器被训练为具有查询标识符相关性分数三元组的多个训练样本的学习机器。

    Finite-state model for processing web queries
    44.
    发明申请
    Finite-state model for processing web queries 失效
    用于处理Web查询的有限状态模型

    公开(公告)号:US20080183673A1

    公开(公告)日:2008-07-31

    申请号:US11698011

    申请日:2007-01-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method of creating an index of web queries is discussed. The method includes receiving a first query representative of one or more symbolic characters and assigning the first query to a first data structure. A first text string representative of the first query is created and assigned to a second data structure. The first and second data structures are stored on a tangible computer readable medium.

    摘要翻译: 讨论了创建Web查询索引的方法。 该方法包括接收表示一个或多个符号字符的第一查询,并将第一查询分配给第一数据结构。 创建表示第一查询的第一文本串并将其分配给第二数据结构。 第一和第二数据结构存储在有形的计算机可读介质上。

    Method and apparatus for distribution-based language model adaptation
    46.
    发明授权
    Method and apparatus for distribution-based language model adaptation 有权
    基于分布式语言模型适应的方法和装置

    公开(公告)号:US07254529B2

    公开(公告)日:2007-08-07

    申请号:US11225543

    申请日:2005-09-13

    IPC分类号: G06F17/27 G06F17/28 G10L15/00

    摘要: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

    摘要翻译: 提供了一种用于使语言模型适应于任务特定领域的方法和装置。 在该方法和装置下,小训练集中的n-gram的相对频率(即任务特定的训练数据集)和大训练集中的n-gram的相对频率(即,域外训练数据集 )用于在大训练集中加权n-g的分布计数。 然后通过从加权分布中识别n克的概率,将加权分布用于形成修改后的语言模型。

    Method and apparatus for compressing asymmetric clustering language models
    47.
    发明授权
    Method and apparatus for compressing asymmetric clustering language models 有权
    用于压缩非对称聚类语言模型的方法和装置

    公开(公告)号:US07231349B2

    公开(公告)日:2007-06-12

    申请号:US10448498

    申请日:2003-05-30

    申请人: Mu Li Jianfeng Gao

    发明人: Mu Li Jianfeng Gao

    IPC分类号: G01L15/06

    摘要: A method and data structure are provided for efficiently storing asymmetric clustering models. The models are stored by storing a first level record for a word identifier and two second level records, one for a word identifier and one for a cluster identifier. An index to the second level word record and an index to the second level cluster record are stored in the first level record. Many of the records in the data structure include both cluster sub-model parameters and word sub-model parameters.

    摘要翻译: 提供了一种方法和数据结构,用于有效地存储非对称聚类模型。 通过存储用于字标识符的第一级记录和两个第二级记录来存储模型,一个用于字标识符,一个用于集群标识符。 第二级记录的索引和第二级集群记录的索引存储在第一级记录中。 数据结构中的许多记录包括集群子模型参数和单词子模型参数。

    Context modeling architecture and framework
    48.
    发明申请
    Context modeling architecture and framework 有权
    上下文建模架构和框架

    公开(公告)号:US20070112546A1

    公开(公告)日:2007-05-17

    申请号:US11253866

    申请日:2005-10-19

    IPC分类号: G06F17/10

    CPC分类号: G06N99/005 G06F9/453

    摘要: A context modeling architecture that includes a context representation portion, which adapted to represent context as features, is provided. The features are specifiable at runtime of an application including the context representation portion.

    摘要翻译: 提供了一种包括上下文表示部分的上下文建模体系结构,其适用于将上下文表示为特征。 这些特征在包括上下文表示部分的应用的运行时是可指定的。

    Processing collocation mistakes in documents
    49.
    发明申请
    Processing collocation mistakes in documents 有权
    处理文件中的并置错误

    公开(公告)号:US20070010992A1

    公开(公告)日:2007-01-11

    申请号:US11177136

    申请日:2005-07-08

    IPC分类号: G06F17/27

    摘要: A sentence is accessed and at least one query is generated based on the sentence. At least one query can be compared to text within a collection of documents, for example using a web search engine. Collocation errors in the sentence can be detected and/or corrected based on the comparison of the at least one query and the text within the collection of documents.

    摘要翻译: 访问一个句子,并且基于该句子生成至少一个查询。 至少可以将一个查询与文档集合中的文本进行比较,例如使用Web搜索引擎。 可以基于至少一个查询与文档集合内的文本的比较来检测和/或修正该句子中的配置错误。

    Method and system for retrieving confirming sentences
    50.
    发明申请
    Method and system for retrieving confirming sentences 有权
    检索确认句子的方法和系统

    公开(公告)号:US20050273318A1

    公开(公告)日:2005-12-08

    申请号:US11187567

    申请日:2005-07-22

    CPC分类号: G06F17/3069 Y10S707/99933

    摘要: A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.

    摘要翻译: 提供了一种方法,计算机可读介质和系统,其响应于查询从句子数据库中检索确认句子。 搜索引擎响应于查询从句子数据库中检索确认句子。 在检索确认语句中,搜索引擎基于查询来定义索引单元,索引单元包括来自查询的引理和与查询相关联的扩展索引单元。 然后,搜索引擎使用定义的索引单元作为搜索参数从句子数据库中检索多个句子。 由搜索引擎确定多个检索到的句子和查询中的每一个之间的相似度,其中每个相似度被确定为查询中的术语的语言权重的函数。 然后,搜索引擎基于所确定的相似度对多个检索到的句子进行排序。