-
公开(公告)号:US08732151B2
公开(公告)日:2014-05-20
申请号:US13078648
申请日:2011-04-01
申请人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
发明人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
IPC分类号: G06F17/30
CPC分类号: G06F17/30672
摘要: Systems, methods, and computer media for identifying query rewriting replacement terms are provided. A list of related string pairs each comprising a first string and second string is received. The first string of each related string pair is a user search query extracted from user click log data. For one or more of the related string pairs, the string pair is provided as inputs to a statistical machine translation model. The model identifies one or more pairs of corresponding terms, each pair of corresponding terms including a first term from the first string and a second term from the second string. The model also calculates a probability of relatedness for each of the one or more pairs of corresponding terms. Term pairs whose calculated probability of relatedness exceeds a threshold are characterized as query term replacements and incorporated, along with the probability of relatedness, into a query rewriting candidate database.
摘要翻译: 提供了用于识别查询重写替换术语的系统,方法和计算机媒体。 接收包括第一串和第二串的相关字符串对的列表。 每个相关字符串对的第一个字符串是从用户点击日志数据中提取的用户搜索查询。 对于一个或多个相关字符串对,字符串对作为统计机器翻译模型的输入提供。 该模型识别一对或多对对应的术语,每对对应的术语包括来自第一个字符串的第一项和来自第二个字符串的第二个项。 该模型还计算一对或多对相应项中的每一对的相关概率。 其相关性概率超过阈值的术语对被表征为查询词替换,并将其与相关性的概率一起并入查询重写候选数据库中。
-
公开(公告)号:US20120254218A1
公开(公告)日:2012-10-04
申请号:US13078648
申请日:2011-04-01
申请人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
发明人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
IPC分类号: G06F17/30
CPC分类号: G06F17/30672
摘要: Systems, methods, and computer media for identifying query rewriting replacement terms are provided. A list of related string pairs each comprising a first string and second string is received. The first string of each related string pair is a user search query extracted from user click log data. For one or more of the related string pairs, the string pair is provided as inputs to a statistical machine translation model. The model identifies one or more pairs of corresponding terms, each pair of corresponding terms including a first term from the first string and a second term from the second string. The model also calculates a probability of relatedness for each of the one or more pairs of corresponding terms. Term pairs whose calculated probability of relatedness exceeds a threshold are characterized as query term replacements and incorporated, along with the probability of relatedness, into a query rewriting candidate database.
摘要翻译: 提供了用于识别查询重写替换术语的系统,方法和计算机媒体。 接收包括第一串和第二串的相关字符串对的列表。 每个相关字符串对的第一个字符串是从用户点击日志数据中提取的用户搜索查询。 对于一个或多个相关字符串对,字符串对作为统计机器翻译模型的输入提供。 该模型识别一对或多对对应的术语,每对对应的术语包括来自第一个字符串的第一项和来自第二个字符串的第二个项。 该模型还计算一对或多对相应项中的每一对的相关概率。 其相关性概率超过阈值的术语对被表征为查询词替换,并将其与相关性的概率一起并入查询重写候选数据库中。
-
公开(公告)号:US09507861B2
公开(公告)日:2016-11-29
申请号:US13078553
申请日:2011-04-01
申请人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
发明人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30672
摘要: Systems, methods, and computer media for identifying related strings for search query rewriting are provided. Session data for a user search query session in an accessed click log data is identified. It is determined whether a first additional search query in the session data is related to a first user search query based on at least one of: dwell time; a number of search result links clicked on; and similarity between web page titles or uniform resource locators (URLs). When related, the first additional search query is incorporated into a list of strings related to the first user search query. One or more supplemental strings that are related to the first user search query are also identified. The identified supplemental strings are also included in the list of strings related to the first user search query.
摘要翻译: 提供了用于识别用于搜索查询重写的相关字符串的系统,方法和计算机媒体。 识别访问的点击日志数据中的用户搜索查询会话的会话数据。 基于以下中的至少一个确定会话数据中的第一附加搜索查询是否与第一用户搜索查询相关:驻留时间; 点击了一些搜索结果链接; 以及网页标题或统一资源定位符(URL)之间的相似性。 当相关时,第一附加搜索查询被合并到与第一用户搜索查询相关的字符串列表中。 还识别与第一用户搜索查询相关的一个或多个补充字符串。 所识别的补充字符串也包括在与第一用户搜索查询相关的字符串列表中。
-
公开(公告)号:US20120254217A1
公开(公告)日:2012-10-04
申请号:US13078553
申请日:2011-04-01
申请人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
发明人: Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30672
摘要: Systems, methods, and computer media for identifying related strings for search query rewriting are provided. Session data for a user search query session in an accessed click log data is identified. It is determined whether a first additional search query in the session data is related to a first user search query based on at least one of: dwell time; a number of search result links clicked on; and similarity between web page titles or uniform resource locators (URLs). When related, the first additional search query is incorporated into a list of strings related to the first user search query. One or more supplemental strings that are related to the first user search query are also identified. The identified supplemental strings are also included in the list of strings related to the first user search query.
摘要翻译: 提供了用于识别用于搜索查询重写的相关字符串的系统,方法和计算机媒体。 识别访问的点击日志数据中的用户搜索查询会话的会话数据。 基于以下中的至少一个确定会话数据中的第一附加搜索查询是否与第一用户搜索查询相关:驻留时间; 点击了一些搜索结果链接; 以及网页标题或统一资源定位符(URL)之间的相似性。 当相关时,第一附加搜索查询被合并到与第一用户搜索查询相关的字符串列表中。 还识别与第一用户搜索查询相关的一个或多个补充字符串。 所识别的补充字符串也包括在与第一用户搜索查询相关的字符串列表中。
-
公开(公告)号:US20120131031A1
公开(公告)日:2012-05-24
申请号:US12951068
申请日:2010-11-22
申请人: Shasha Xie , Xiaodong He , Jianfeng Gao
发明人: Shasha Xie , Xiaodong He , Jianfeng Gao
IPC分类号: G06F17/30
CPC分类号: G06F17/30967 , G06F17/30672
摘要: An alteration candidate for a query can be scored. The scoring may include computing one or more query-dependent feature scores and/or one or more intra-candidate dependent feature scores. The computation of the query-dependent feature score(s) can be based on dependencies to multiple query terms from each of one or more alteration terms (i.e., for each of the one or more alteration terms, there can be dependencies to multiple query terms that form at least a portion of the basis for the query-dependent feature score(s)). The computation of the intra-candidate dependent feature score(s) can be based on dependencies between different terms in the alteration candidate. A candidate score can be computed using the query dependent feature score(s) and/or the intra-candidate dependent feature score(s). Additionally, the candidate score can be used in determining whether to select the candidate to expand the query. If selected, the candidate can be used to expand the query.
摘要翻译: 可以对查询的变更候选进行评分。 评分可以包括计算一个或多个依赖于查询的特征得分和/或一个或多个候选内相关特征得分。 依赖于查询的特征得分的计算可以基于来自一个或多个改变项中的每一个的多个查询词的依赖性(即,对于一个或多个改变术语中的每一个,可以依赖于多个查询术语 其形成用于查询相关特征得分的基础的至少一部分)。 候选者相关特征得分的计算可以基于变更候选者中不同术语之间的依赖关系。 可以使用查询相关特征得分和/或候选内相关特征得分来计算候选分数。 此外,可以使用候选分数来确定是否选择候选来扩展查询。 如果选择,候选人可以用来扩展查询。
-
公开(公告)号:US08060358B2
公开(公告)日:2011-11-15
申请号:US12147807
申请日:2008-06-27
申请人: Xiaodong He , Mei Yang , Jianfeng Gao , Patrick Nguyen
发明人: Xiaodong He , Mei Yang , Jianfeng Gao , Patrick Nguyen
IPC分类号: G06F17/28
CPC分类号: G06F17/2827 , G06F17/2818
摘要: A computing system configured to produce an optimized translation hypothesis of text input into the computing system. The computing system includes a plurality of translation machines. Each of the translation machines is configured to produce their own translation hypothesis from the same text. An optimization machine is connected to the plurality of translation machines. The optimization machine is configured to receive the translation hypotheses from the translation machines. The optimization machine is further configured to align, word-to-word, the hypotheses in the plurality of hypotheses by using a hidden Markov model.
摘要翻译: 一种计算系统,被配置为产生文本输入到所述计算系统中的优化翻译假说。 计算系统包括多个翻译机。 每个翻译机被配置为从相同的文本产生他们自己的翻译假设。 优化机连接到多台翻译机。 优化机被配置为从翻译机接收翻译假说。 优化机还被配置为通过使用隐马尔科夫模型来对齐单词到多个假设中的假设。
-
公开(公告)号:US08521672B2
公开(公告)日:2013-08-27
申请号:US12951068
申请日:2010-11-22
申请人: Shasha Xie , Xiaodong He , Jianfeng Gao
发明人: Shasha Xie , Xiaodong He , Jianfeng Gao
CPC分类号: G06F17/30967 , G06F17/30672
摘要: An alteration candidate for a query can be scored. The scoring may include computing one or more query-dependent feature scores and/or one or more intra-candidate dependent feature scores. The computation of the query-dependent feature score(s) can be based on dependencies to multiple query terms from each of one or more alteration terms (i.e., for each of the one or more alteration terms, there can be dependencies to multiple query terms that form at least a portion of the basis for the query-dependent feature score(s)). The computation of the intra-candidate dependent feature score(s) can be based on dependencies between different terms in the alteration candidate. A candidate score can be computed using the query dependent feature score(s) and/or the intra-candidate dependent feature score(s). Additionally, the candidate score can be used in determining whether to select the candidate to expand the query. If selected, the candidate can be used to expand the query.
摘要翻译: 可以对查询的变更候选进行评分。 评分可以包括计算一个或多个依赖于查询的特征得分和/或一个或多个候选内相关特征得分。 依赖于查询的特征得分的计算可以基于来自一个或多个改变项中的每一个的多个查询词的依赖性(即,对于一个或多个改变术语中的每一个,可以依赖于多个查询术语 其形成用于查询相关特征得分的基础的至少一部分)。 候选者相关特征得分的计算可以基于变更候选者中不同术语之间的依赖关系。 可以使用查询相关特征得分和/或候选内相关特征得分来计算候选分数。 此外,可以使用候选分数来确定是否选择候选来扩展查询。 如果选择,候选人可以用来扩展查询。
-
公开(公告)号:US08473486B2
公开(公告)日:2013-06-25
申请号:US12962751
申请日:2010-12-08
申请人: Xiaodong He , Jianfeng Gao , Jennifer Gillenwater
发明人: Xiaodong He , Jianfeng Gao , Jennifer Gillenwater
CPC分类号: G06F17/30864
摘要: A supervised technique uses relevance judgments to train a dependency parser such that it approximately optimizes Normalized Discounted Cumulative Gain (NDCG) in information retrieval. A weighted tree edit distance between the parse tree for a query and the parse tree for a document is added to a ranking function, where the edit distance weights are parameters from the parser. Using parser parameters in the ranking function enables approximate optimization of the parser's parameters for NDCG by adding some constraints to the objective function.
摘要翻译: 监督技术使用相关性判断来训练依赖性解析器,使得它在信息检索中大致优化归一化折扣累积增益(NDCG)。 用于查询的解析树和文档的解析树之间的加权树编辑距离被添加到排序函数,其中编辑距离权重是来自解析器的参数。 在排序函数中使用解析器参数可以通过向目标函数添加一些约束来近似优化NDCG的解析器参数。
-
公开(公告)号:US20120203539A1
公开(公告)日:2012-08-09
申请号:US13022633
申请日:2011-02-08
申请人: Amittai Axelrod , Jianfeng Gao , Xiaodong He
发明人: Amittai Axelrod , Jianfeng Gao , Xiaodong He
IPC分类号: G06F17/28
CPC分类号: G06F17/2809
摘要: Architecture that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.
摘要翻译: 架构提供了从外域语料库中选择最相关的数据的能力,以隔离或与域内数据组合使用。 该架构是机器翻译的域适应,从较大的平行翻译句子的一般领域语料库中选择最相关的句子。 选择数据的方法包括单语交叉熵测度,单语交叉熵差,双语交叉熵和双语交叉熵差。 对域内数据和外域子集进行翻译模型的训练,并将这些模型插值到一起,以提升域内翻译任务的性能。
-
公开(公告)号:US20120150836A1
公开(公告)日:2012-06-14
申请号:US12962751
申请日:2010-12-08
申请人: Xiaodong He , Jianfeng Gao , Jennifer Gillenwater
发明人: Xiaodong He , Jianfeng Gao , Jennifer Gillenwater
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: A supervised technique uses relevance judgments to train a dependency parser such that it approximately optimizes Normalized Discounted Cumulative Gain (NDCG) in information retrieval. A weighted tree edit distance between the parse tree for a query and the parse tree for a document is added to a ranking function, where the edit distance weights are parameters from the parser. Using parser parameters in the ranking function enables approximate optimization of the parser's parameters for NDCG by adding some constraints to the objective function.
摘要翻译: 监督技术使用相关性判断来训练依赖性解析器,使得它在信息检索中大致优化归一化折扣累积增益(NDCG)。 用于查询的解析树和文档的解析树之间的加权树编辑距离被添加到排序函数,其中编辑距离权重是来自解析器的参数。 在排序函数中使用解析器参数可以通过向目标函数添加一些约束来近似优化NDCG的解析器参数。
-
-
-
-
-
-
-
-
-