NORMALIZING QUERY WORDS IN WEB SEARCH
    11.
    发明申请
    NORMALIZING QUERY WORDS IN WEB SEARCH 有权
    在网络搜索中正确查询词

    公开(公告)号:US20090259643A1

    公开(公告)日:2009-10-15

    申请号:US12103382

    申请日:2008-04-15

    IPC分类号: G06F17/30 G06F17/20

    摘要: A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.

    摘要翻译: 用于在网页搜索中归一化查询词的方法包括:从查询日志的聚合中填入具有连接和分离候选的词典和对应的连接和分割词; 确定联合和分裂候选人的置信度分数,每个词典的最高置信度分数分别表示为必须连接和必须分裂; 接受具有适合分裂或加入的词语的查询,或适合添加或删除连字符或撇号; 基于所接受的查询,从词典中分离出候选者,并且以算术方式生成加入,连字符或撇号的候选者; 并向搜索引擎提交产生的​​可能的候选人,其特征在于字典中必须加入或必须拆分,以改善响应于查询返回的搜索结果; 将语言字典应用于未被表征为必须拆分或必须加入的生成候选者,以便将其排在最高级别的搜索引擎中。

    SYSTEM AND METHOD FOR RANKING WEB SEARCHES WITH QUANTIFIED SEMANTIC FEATURES
    12.
    发明申请
    SYSTEM AND METHOD FOR RANKING WEB SEARCHES WITH QUANTIFIED SEMANTIC FEATURES 审中-公开
    使用量化的语义特征排序网页搜索的系统和方法

    公开(公告)号:US20100191740A1

    公开(公告)日:2010-07-29

    申请号:US12360016

    申请日:2009-01-26

    IPC分类号: G06F17/30

    CPC分类号: G06F16/9535

    摘要: A system and method for ranking web searches with quantified semantic features. A query for a web search is received from a user. The query is segmented and tagged into one or more linguistic segments using linguistic analysis. At least some of the linguistic segments are tagged with a linguistic type. A query execution plan is generated comprising the linguistic segments and, for each of the linguistic segments tagged with a linguistic type, at least one tag attribute comprising at least one domain specific feature of the linguistic type. A search is performed for documents matching the query. Each of the documents is scored for each of the linguistic segments of the query execution plan using the tag attributes of the respective linguistic segment. The documents are ranked using a function that uses the scores of the documents. A ranked list of the documents is transmitted back to the user.

    摘要翻译: 一种用量化语义特征对网页搜索进行排名的系统和方法。 从用户接收到对网页搜索的查询。 使用语言分析将查询分段并标记为一个或多个语言段。 至少一些语言段被用语言类型标记。 生成包括语言段的查询执行计划,并且对于每个具有语言类型的语言段,至少包括语言类型的至少一个域特定特征的标签属性。 对与查询匹配的文档执行搜索。 使用相应语言段的标签属性对查询执行计划中的每个语言段进行每个文档的评分。 使用使用文档分数的函数对文档进行排名。 将文档的排名列表传回给用户。

    TOPICAL RANKING IN INFORMATION RETRIEVAL
    13.
    发明申请
    TOPICAL RANKING IN INFORMATION RETRIEVAL 审中-公开
    信息检索中的主题排名

    公开(公告)号:US20100185623A1

    公开(公告)日:2010-07-22

    申请号:US12354533

    申请日:2009-01-15

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/334

    摘要: An aggregate ranking model is generated, which comprises a general ranking model and one or more topical training models. Each topical ranking model is associated with a topic, or topic class, and for use in ranking search result items determined to belong to the topic, or topic class. As one example, the topical ranking model is trained using a set of topical training data, e.g., training data determined to belong to the topic, or topic class, a general ranking model and a residue, or error, determined from a general ranking generated by the general ranking model for the topical training data, with the topical ranking model being trained to minimize the general ranking model's error in the aggregate ranking model.

    摘要翻译: 产生一个综合排名模型,其中包括一般排名模型和一个或多个主题训练模型。 每个主题排名模型与主题或主题类相关联,并且用于对确定属于主题或主题类的搜索结果项进行排名。 作为一个示例,使用一组主题训练数据训练主题排名模型,例如,确定为属于主题的训练数据,或主题类别,一般排名模型和残差或错误,其从生成的一般排名确定 通过主题训练数据的一般排名模型,对主题排名模型进行训练,以最小化总排名模型在总体排名模型中的误差。

    SELECTIVE TERM WEIGHTING FOR WEB SEARCH BASED ON AUTOMATIC SEMANTIC PARSING
    14.
    发明申请
    SELECTIVE TERM WEIGHTING FOR WEB SEARCH BASED ON AUTOMATIC SEMANTIC PARSING 审中-公开
    基于自动语义分析的网络搜索选择性加权

    公开(公告)号:US20100114878A1

    公开(公告)日:2010-05-06

    申请号:US12256371

    申请日:2008-10-22

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F16/334

    摘要: A method is provided for selecting relevant documents returned from a search query. When a search engine finds search terms in documents, the document score is based on the frequency of the occurrence of those terms, the category of the term, and the section of the document in which the term is found. Each (category type, document section) pair is assigned a weight that is used to modify the contribution of term frequency. The weights are determined in an offline process using historical data and human validation. Through this empirical process, the weight assignments are made to correlate high relevance scores with documents that humans would find relevant to a search query.

    摘要翻译: 提供了一种用于选择从搜索查询返回的相关文档的方法。 当搜索引擎在文档中找到搜索词时,文档分数基于这些术语的发生频率,术语的类别以及找到该术语的文档的部分。 每个(类别类型,文档部分)对被分配一个权重,用于修改术语频率的贡献。 权重是使用历史数据和人类验证在离线过程中确定的。 通过这个经验过程,进行权重分配以将高相关性分数与人类将会发现与搜索查询相关的文档相关联。

    AUTOMATIC QUERY CONCEPTS IDENTIFICATION AND DRIFTING FOR WEB SEARCH
    15.
    发明申请
    AUTOMATIC QUERY CONCEPTS IDENTIFICATION AND DRIFTING FOR WEB SEARCH 审中-公开
    自动查询概念识别和网络搜索

    公开(公告)号:US20100094835A1

    公开(公告)日:2010-04-15

    申请号:US12252220

    申请日:2008-10-15

    IPC分类号: G06F7/06 G06F17/30 G06N5/02

    摘要: Techniques are described for automatically determining which terms in a search query may be augmented by contextually similar terms such that more relevant results can be displayed to a user. Contextually similar words are determined based on training data, including a web corpus and a query log. Once contextually similar words are determined, they may be inserted into a search query and used to find more relevant results. Consequently, documents that contain helpful information but may not have exact word matches may be found more readily by a search engine.

    摘要翻译: 描述了用于自动确定搜索查询中哪些术语可以通过上下文相似术语来增强的技术,使得可以向用户显示更相关的结果。 基于训练数据确定上下文相似的词,包括网络语料库和查询日志。 一旦上下文相似的词被确定,它们可以被插入到搜索查询中并用于找到更相关的结果。 因此,搜索引擎可以更容易地找到包含有用信息但可能没有确切字词匹配的文档。

    Search assist powered by session analysis
    16.
    发明授权
    Search assist powered by session analysis 有权
    搜索辅助由会话分析

    公开(公告)号:US08255414B2

    公开(公告)日:2012-08-28

    申请号:US12882974

    申请日:2010-09-15

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3064

    摘要: One embodiment selects from a set of query-suggestion pairs a first query and a subset of query-suggestion pairs that each has the first query as its query; computes a Log Likelihood Ratio (LLR) value for each query-suggestion pair from the subset of query-suggestion pairs; ranks the subset of query-suggestion pairs according to their respective LLR values; removes from the subset of query-suggestion pairs all query-suggestion pairs whose LLR values are below a predetermined LLR threshold; computes a Pointwise Mutual Information (PMI) value for each remaining query suggestion pair from the subset of query-suggestion pairs; removes from the subset of query-suggestion pairs all query-suggestion pairs whose PMI values are below a predetermine PMI threshold; and constructs a ranked set of suggestions for the first query, wherein the ranked set of suggestions comprises one or more suggestions of the remaining query-suggestion pairs from the subset of query-suggestion pairs.

    摘要翻译: 一个实施例从一组查询建议对中选择第一查询和每个具有第一查询作为其查询的查询建议对的子集; 从查询建议对的子集计算每个查询建议对的对数似然比(LLR)值; 根据各自的LLR值对查询建议对的子集进行排序; 从查询建议对的子集中删除其LLR值低于预定LLR阈值的所有查询建议对; 从查询建议对的子集计算每个剩余查询建议对的点向相互信息(PMI)值; 从查询建议对的子集中删除其PMI值低于预定PMI阈值的所有查询建议对; 并且针对第一查询构建排序的建议集合,其中排名的建议集包括来自查询建议对的子集的剩余查询建议对的一个或多个建议。

    Social knowledge system content quality
    17.
    发明授权
    Social knowledge system content quality 有权
    社会知识体系内容质量

    公开(公告)号:US07865452B2

    公开(公告)日:2011-01-04

    申请号:US12491133

    申请日:2009-06-24

    IPC分类号: G06F15/18

    CPC分类号: G06N5/022

    摘要: Techniques for automatically scoring submissions to an online question-and-answer submission system are disclosed. According to one such technique, an initial set of user submissions are scored by human operators and/or automated algorithmic mechanisms. The submissions and their accompanying scores are provided as training data to an automated machine learning mechanism. The machine learning mechanism processes the training data and automatically detects patterns in the provided submissions. The machine learning mechanism automatically correlates these patterns with the scores assigned to the submissions that match those patterns. As a result, the machine learning mechanism is trained. Thereafter, the machine learning mechanism processes unscored submissions. The machine learning mechanism automatically identifies, from among the patterns that the machine learning mechanism has already detected, one or more patterns that these submissions match. The machine learning mechanism automatically scores these submissions based on the matching patterns and the scores that are associated with those patterns.

    摘要翻译: 公开了自动评分提交给在线问答提交系统的技术。 根据一种这样的技术,人类操作者和/或自动算法机制对初始的用户提交集进行评分。 提交的材料及其附带的分数作为培训数据提供给自动化机器学习机制。 机器学习机制处理训练数据并自动检测提供的提交中的模式。 机器学习机制自动将这些模式与分配给与这些模式匹配的提交的分数相关联。 因此,机器学习机制得到了培训。 此后,机器学习机制处理未评分的提交。 机器学习机制自动识别机器学习机制已经检测到的模式之一,这些提交匹配的一个或多个模式。 机器学习机制根据与这些模式相关联的匹配模式和分数自动对这些提交进行评分。

    Method and apparatus providing hypothesis driven speech modelling for use in speech recognition
    18.
    发明授权
    Method and apparatus providing hypothesis driven speech modelling for use in speech recognition 失效
    提供用于语音识别的假设驱动语音建模的方法和装置

    公开(公告)号:US06868381B1

    公开(公告)日:2005-03-15

    申请号:US09468138

    申请日:1999-12-21

    摘要: A speech recognition system having an input for receiving an input signal indicative of a spoken utterance that is indicative of at least one speech element. The system further includes a first processing unit operative for processing the input signal to derive from a speech recognition dictionary a speech model associated to a given speech element that constitutes a potential match to the at least one speech element. The system further comprised a second processing unit for generating a modified version of the speech model on the basis of the input signal. The system further provides a third processing unit for processing the input signal on the basis of the modified version of the speech model to generate a recognition result indicative of whether the modified version of the at least one speech model constitutes a match to the input signal. The second processing unit allows the speech model to be modified on the basis of the recognition attempt thereby allowing speech recognition to be effected on the basis of the modified speech model. This permits adaptation of the speech models during the recognition process. The invention further provides an apparatus, method and computer readable medium for implementing the second processing unit.

    摘要翻译: 一种语音识别系统,具有用于接收表示至少一个语音元素的表示话语的输入信号的输入。 该系统还包括第一处理单元,其操作用于处理输入信号以从语音识别词典中导出与构成与至少一个语音元素的潜在匹配的给定语音元素相关联的语音模型。 该系统还包括第二处理单元,用于基于输入信号产生语音模型的修改版本。 该系统还提供一个第三处理单元,用于基于该语音模型的修改版本来处理该输入信号,以产生一个表示该至少一个语音模型的修改版本是否构成对该输入信号的匹配的识别结果。 第二处理单元允许基于识别尝试来修改语音模型,从而允许基于修改的语音模型来实现语音识别。 这允许在识别过程中对语音模型进行适应。 本发明还提供了一种用于实现第二处理单元的装置,方法和计算机可读介质。

    Query expansion and weighting based on results of automatic speech recognition
    19.
    发明授权
    Query expansion and weighting based on results of automatic speech recognition 有权
    基于自动语音识别结果查询扩展和加权

    公开(公告)号:US06856957B1

    公开(公告)日:2005-02-15

    申请号:US09779023

    申请日:2001-02-07

    申请人: Benoit Dumoulin

    发明人: Benoit Dumoulin

    CPC分类号: G10L15/22 G10L15/1815

    摘要: A technique for identifying one or more items from amongst a plurality of items in response to a spoken utterance is used to improve call routing and information retrieval systems which employ automatic speech recognition (ASR). An automatic speech recognizer is used to recognize the utterance, including generating a plurality of hypotheses for the utterance. A query element is then generated for use in identifying one or more items from amongst the plurality of items. The query element includes a set of values representing two or more of the hypotheses, each value corresponding to one of the words in the hypotheses. Each value in the query element is then weighted based on hypothesis confidence, word confidence, or both, as determined by the ASR process. The query element is then applied to the plurality of items to identify one or more items which satisfy the query.

    摘要翻译: 用于响应于口语发音从多个项目中识别一个或多个项目的技术被用于改进采用自动语音识别(ASR)的呼叫路由和信息检索系统。 自动语音识别器用于识别话语,包括产生用于发音的多个假设。 然后生成查询元素用于从多个项目中识别一个或多个项目。 查询元素包括表示两个或更多个假设的值的集合,每个值对应于假设中的一个词。 然后根据由ASR过程确定的假设置信度,单词置信度或两者来对查询元素中的每个值进行加权。 然后将查询元素应用于多个项目以识别满足查询的一个或多个项目。

    Social knowledge system content quality
    20.
    发明授权
    Social knowledge system content quality 有权
    社会知识体系内容质量

    公开(公告)号:US07571145B2

    公开(公告)日:2009-08-04

    申请号:US11583464

    申请日:2006-10-18

    IPC分类号: G06F15/18

    CPC分类号: G06N5/022

    摘要: Techniques for automatically scoring submissions to an online question-and-answer submission system are disclosed. According to one such technique, an initial set of user submissions are scored by human operators and/or automated algorithmic mechanisms. The submissions and their accompanying scores are provided as training data to an automated machine learning mechanism. The machine learning mechanism processes the training data and automatically detects patterns in the provided submissions. The machine learning mechanism automatically correlates these patterns with the scores assigned to the submissions that match those patterns. As a result, the machine learning mechanism is trained. Thereafter, the machine learning mechanism processes unscored submissions. The machine learning mechanism automatically identifies, from among the patterns that the machine learning mechanism has already detected, one or more patterns that these submissions match. The machine learning mechanism automatically scores these submissions based on the matching patterns and the scores that are associated with those patterns.

    摘要翻译: 公开了自动评分提交给在线问答提交系统的技术。 根据一种这样的技术,人类操作者和/或自动算法机制对初始的用户提交集进行评分。 提交的材料及其附带的分数作为培训数据提供给自动化机器学习机制。 机器学习机制处理训练数据并自动检测提供的提交中的模式。 机器学习机制自动将这些模式与分配给与这些模式匹配的提交的分数相关联。 因此,机器学习机制得到了培训。 此后,机器学习机制处理未评分的提交。 机器学习机制自动识别机器学习机制已经检测到的模式之一,这些提交匹配的一个或多个模式。 机器学习机制根据与这些模式相关联的匹配模式和分数自动对这些提交进行评分。