SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA
    1.
    发明申请
    SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA 审中-公开
    在社会媒体中发现高质量内容的系统和方法

    公开(公告)号:US20100036784A1

    公开(公告)日:2010-02-11

    申请号:US12187580

    申请日:2008-08-07

    IPC分类号: G06N5/00

    摘要: The present invention is directed towards systems and methods for identifying high quality content in a social media environment. The method according to one embodiment of the present invention comprises retrieving a content item and retrieving a plurality of quality features associated with said content item wherein said quality features comprise intrinsic, usage and relationship features. The method then performs an analysis of said content item against said quality features and generates a quality score based on said analysis.

    摘要翻译: 本发明涉及用于在社交媒体环境中识别高质量内容的系统和方法。 根据本发明的一个实施例的方法包括检索内容项目并检索与所述内容项目相关联的多个质量特征,其中所述质量特征包括固有的,使用和关系特征。 该方法然后根据所述质量特征执行对所述内容项的分析,并且基于所述分析生成质量得分。

    Search assist powered by session analysis
    2.
    发明授权
    Search assist powered by session analysis 有权
    搜索辅助由会话分析

    公开(公告)号:US08255414B2

    公开(公告)日:2012-08-28

    申请号:US12882974

    申请日:2010-09-15

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3064

    摘要: One embodiment selects from a set of query-suggestion pairs a first query and a subset of query-suggestion pairs that each has the first query as its query; computes a Log Likelihood Ratio (LLR) value for each query-suggestion pair from the subset of query-suggestion pairs; ranks the subset of query-suggestion pairs according to their respective LLR values; removes from the subset of query-suggestion pairs all query-suggestion pairs whose LLR values are below a predetermined LLR threshold; computes a Pointwise Mutual Information (PMI) value for each remaining query suggestion pair from the subset of query-suggestion pairs; removes from the subset of query-suggestion pairs all query-suggestion pairs whose PMI values are below a predetermine PMI threshold; and constructs a ranked set of suggestions for the first query, wherein the ranked set of suggestions comprises one or more suggestions of the remaining query-suggestion pairs from the subset of query-suggestion pairs.

    摘要翻译: 一个实施例从一组查询建议对中选择第一查询和每个具有第一查询作为其查询的查询建议对的子集; 从查询建议对的子集计算每个查询建议对的对数似然比(LLR)值; 根据各自的LLR值对查询建议对的子集进行排序; 从查询建议对的子集中删除其LLR值低于预定LLR阈值的所有查询建议对; 从查询建议对的子集计算每个剩余查询建议对的点向相互信息(PMI)值; 从查询建议对的子集中删除其PMI值低于预定PMI阈值的所有查询建议对; 并且针对第一查询构建排序的建议集合,其中排名的建议集包括来自查询建议对的子集的剩余查询建议对的一个或多个建议。

    Abbreviation handling in web search
    3.
    发明授权
    Abbreviation handling in web search 有权
    网页搜索中的缩写处理

    公开(公告)号:US08204874B2

    公开(公告)日:2012-06-19

    申请号:US12884708

    申请日:2010-09-17

    IPC分类号: G06F17/00 G06F7/00

    CPC分类号: G06F17/30672

    摘要: A method for handling abbreviations in web queries includes building a dictionary of possible word expansions for potential abbreviations related to query terms received and anticipated to be received by a search engine; accepting a query including an abbreviation from a searching user, where a probability of finding a most probably-correct expansion in the dictionary is a first probability, and a probability that the expansion is the abbreviation itself is a second probability; determining a ratio between the first and second probabilities; expanding the abbreviation in accordance with the most probably-correct expansion when the ratio is above a first threshold value; and highlighting the abbreviation with a suggested expansion of the most probably-correct expansion for the user so that the user may accept the suggested expansion when the ratio is between a second, lower threshold value and the first threshold value.

    摘要翻译: 用于处理网络查询中的缩写的方法包括为与搜索引擎接收并预期接收的查询词相关的潜在缩写构建可能的词扩展的字典; 接受包括来自搜索用户的缩写的查询,其中发现字典中最可能正确的扩展的概率是第一概率,并且扩展是缩写本身的概率是第二概率; 确定第一和第二概率之间的比率; 当比率高于第一阈值时,根据最可能正确的扩展扩展缩写; 并且突出显示缩写,其中建议扩展用户的最可能正确的扩展,使得当比率在第二阈值和下限阈值之间时,用户可以接受建议的扩展。

    Social knowledge system content quality
    4.
    发明授权
    Social knowledge system content quality 有权
    社会知识体系内容质量

    公开(公告)号:US07865452B2

    公开(公告)日:2011-01-04

    申请号:US12491133

    申请日:2009-06-24

    IPC分类号: G06F15/18

    CPC分类号: G06N5/022

    摘要: Techniques for automatically scoring submissions to an online question-and-answer submission system are disclosed. According to one such technique, an initial set of user submissions are scored by human operators and/or automated algorithmic mechanisms. The submissions and their accompanying scores are provided as training data to an automated machine learning mechanism. The machine learning mechanism processes the training data and automatically detects patterns in the provided submissions. The machine learning mechanism automatically correlates these patterns with the scores assigned to the submissions that match those patterns. As a result, the machine learning mechanism is trained. Thereafter, the machine learning mechanism processes unscored submissions. The machine learning mechanism automatically identifies, from among the patterns that the machine learning mechanism has already detected, one or more patterns that these submissions match. The machine learning mechanism automatically scores these submissions based on the matching patterns and the scores that are associated with those patterns.

    摘要翻译: 公开了自动评分提交给在线问答提交系统的技术。 根据一种这样的技术,人类操作者和/或自动算法机制对初始的用户提交集进行评分。 提交的材料及其附带的分数作为培训数据提供给自动化机器学习机制。 机器学习机制处理训练数据并自动检测提供的提交中的模式。 机器学习机制自动将这些模式与分配给与这些模式匹配的提交的分数相关联。 因此,机器学习机制得到了培训。 此后,机器学习机制处理未评分的提交。 机器学习机制自动识别机器学习机制已经检测到的模式之一,这些提交匹配的一个或多个模式。 机器学习机制根据与这些模式相关联的匹配模式和分数自动对这些提交进行评分。

    Method and apparatus providing hypothesis driven speech modelling for use in speech recognition
    5.
    发明授权
    Method and apparatus providing hypothesis driven speech modelling for use in speech recognition 失效
    提供用于语音识别的假设驱动语音建模的方法和装置

    公开(公告)号:US06868381B1

    公开(公告)日:2005-03-15

    申请号:US09468138

    申请日:1999-12-21

    摘要: A speech recognition system having an input for receiving an input signal indicative of a spoken utterance that is indicative of at least one speech element. The system further includes a first processing unit operative for processing the input signal to derive from a speech recognition dictionary a speech model associated to a given speech element that constitutes a potential match to the at least one speech element. The system further comprised a second processing unit for generating a modified version of the speech model on the basis of the input signal. The system further provides a third processing unit for processing the input signal on the basis of the modified version of the speech model to generate a recognition result indicative of whether the modified version of the at least one speech model constitutes a match to the input signal. The second processing unit allows the speech model to be modified on the basis of the recognition attempt thereby allowing speech recognition to be effected on the basis of the modified speech model. This permits adaptation of the speech models during the recognition process. The invention further provides an apparatus, method and computer readable medium for implementing the second processing unit.

    摘要翻译: 一种语音识别系统,具有用于接收表示至少一个语音元素的表示话语的输入信号的输入。 该系统还包括第一处理单元,其操作用于处理输入信号以从语音识别词典中导出与构成与至少一个语音元素的潜在匹配的给定语音元素相关联的语音模型。 该系统还包括第二处理单元,用于基于输入信号产生语音模型的修改版本。 该系统还提供一个第三处理单元,用于基于该语音模型的修改版本来处理该输入信号,以产生一个表示该至少一个语音模型的修改版本是否构成对该输入信号的匹配的识别结果。 第二处理单元允许基于识别尝试来修改语音模型,从而允许基于修改的语音模型来实现语音识别。 这允许在识别过程中对语音模型进行适应。 本发明还提供了一种用于实现第二处理单元的装置,方法和计算机可读介质。

    Query expansion and weighting based on results of automatic speech recognition
    6.
    发明授权
    Query expansion and weighting based on results of automatic speech recognition 有权
    基于自动语音识别结果查询扩展和加权

    公开(公告)号:US06856957B1

    公开(公告)日:2005-02-15

    申请号:US09779023

    申请日:2001-02-07

    申请人: Benoit Dumoulin

    发明人: Benoit Dumoulin

    CPC分类号: G10L15/22 G10L15/1815

    摘要: A technique for identifying one or more items from amongst a plurality of items in response to a spoken utterance is used to improve call routing and information retrieval systems which employ automatic speech recognition (ASR). An automatic speech recognizer is used to recognize the utterance, including generating a plurality of hypotheses for the utterance. A query element is then generated for use in identifying one or more items from amongst the plurality of items. The query element includes a set of values representing two or more of the hypotheses, each value corresponding to one of the words in the hypotheses. Each value in the query element is then weighted based on hypothesis confidence, word confidence, or both, as determined by the ASR process. The query element is then applied to the plurality of items to identify one or more items which satisfy the query.

    摘要翻译: 用于响应于口语发音从多个项目中识别一个或多个项目的技术被用于改进采用自动语音识别(ASR)的呼叫路由和信息检索系统。 自动语音识别器用于识别话语,包括产生用于发音的多个假设。 然后生成查询元素用于从多个项目中识别一个或多个项目。 查询元素包括表示两个或更多个假设的值的集合,每个值对应于假设中的一个词。 然后根据由ASR过程确定的假设置信度,单词置信度或两者来对查询元素中的每个值进行加权。 然后将查询元素应用于多个项目以识别满足查询的一个或多个项目。

    Semantic and text matching techniques for network search
    7.
    发明授权
    Semantic and text matching techniques for network search 有权
    网络搜索的语义和文本匹配技术

    公开(公告)号:US08112436B2

    公开(公告)日:2012-02-07

    申请号:US12563357

    申请日:2009-09-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.

    摘要翻译: 在一个实施例中,访问包括一个或多个查询词的搜索查询,表示一个或多个查询概念的查询词中的至少一个; 访问由搜索引擎识别为搜索查询的网络文档,所述网络文档包括一个或多个文档字,所述文档字中的至少一个表示一个或多个文档概念; 语义文本匹配搜索查询和网络文档以确定一个或多个否定语义文本匹配; 并基于负面语义文本匹配构造一个或多个负面特征。

    Normalizing query words in web search
    8.
    发明授权
    Normalizing query words in web search 有权
    在网页搜索中规范化查询词

    公开(公告)号:US08010547B2

    公开(公告)日:2011-08-30

    申请号:US12103382

    申请日:2008-04-15

    IPC分类号: G06F17/30

    摘要: A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.

    摘要翻译: 用于在网页搜索中归一化查询词的方法包括:从查询日志的聚合中填入具有连接和分离候选的词典和对应的连接和分割词; 确定联合和分裂候选人的置信度分数,每个词典的最高置信度分数分别表示为必须连接和必须分裂; 接受具有适合分裂或加入的词语的查询,或适合添加或删除连字符或撇号; 基于所接受的查询,从词典中分离出候选者,并且以算术方式生成加入,连字符或撇号的候选者; 并向搜索引擎提交产生的​​可能的候选人,其特征在于字典中必须加入或必须拆分,以改善响应于查询返回的搜索结果; 将语言字典应用于未被表征为必须拆分或必须加入的生成候选者,以便将其排在最高级别的搜索引擎中。

    Abbreviation handling in web search
    9.
    发明授权
    Abbreviation handling in web search 有权
    Web搜索中的缩写处理

    公开(公告)号:US07809715B2

    公开(公告)日:2010-10-05

    申请号:US12103126

    申请日:2008-04-15

    IPC分类号: G06F17/00 G06F7/00

    CPC分类号: G06F17/30672

    摘要: A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.

    摘要翻译: 一种用于处理网页查询中的缩写的方法包括为与搜索引擎接收或预期接收的查询词相关的多个潜在缩写构建多个可能的词扩展的字典; 接受包括缩写的查询; 如果扩展正确的概率高于阈值,则将缩写扩展为多个字扩展中的一个,其中,通过考虑查询内的缩写的上下文来确定概率,其中,上下文至少包括锚 文本; 并将具有扩展缩写的查询发送到搜索引擎以生成与查询相关的搜索结果页面。

    NORMALIZING QUERY WORDS IN WEB SEARCH
    10.
    发明申请
    NORMALIZING QUERY WORDS IN WEB SEARCH 有权
    在网络搜索中正确查询词

    公开(公告)号:US20090259643A1

    公开(公告)日:2009-10-15

    申请号:US12103382

    申请日:2008-04-15

    IPC分类号: G06F17/30 G06F17/20

    摘要: A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.

    摘要翻译: 用于在网页搜索中归一化查询词的方法包括:从查询日志的聚合中填入具有连接和分离候选的词典和对应的连接和分割词; 确定联合和分裂候选人的置信度分数,每个词典的最高置信度分数分别表示为必须连接和必须分裂; 接受具有适合分裂或加入的词语的查询,或适合添加或删除连字符或撇号; 基于所接受的查询,从词典中分离出候选者,并且以算术方式生成加入,连字符或撇号的候选者; 并向搜索引擎提交产生的​​可能的候选人,其特征在于字典中必须加入或必须拆分,以改善响应于查询返回的搜索结果; 将语言字典应用于未被表征为必须拆分或必须加入的生成候选者,以便将其排在最高级别的搜索引擎中。