专利检索 ap:("Jianfeng Gao" OR "Mingjing Li") AND inv:"Jianfeng Gao" 第 1 页

1.

发明授权
Method and apparatus for adapting a class entity dictionary used with language models 有权

公开(公告)号：US07124080B2

公开(公告)日：2006-10-17

申请号：US10008432

申请日：2001-11-13

申请人： Zheng Chen , Jianfeng Gao , Mingjing Li , Feng Zhang

发明人： Zheng Chen , Jianfeng Gao , Mingjing Li , Feng Zhang

IPC分类号： G10L15/06 , G10L15/00

CPC分类号： G06F17/2715 , G06F17/2775

摘要： A method and apparatus are provided for augmenting a language model with a class entity dictionary based on corrections made by a user. Under the method and apparatus, a user corrects an output that is based in part on the language model by replacing an output segment with a correct segment. The correct segment is added to a class of segments in the class entity dictionary and a probability of the correct segment given the class is estimated based on an n-gram probability associated with the output segment and an n-gram probability associated with the class. This estimated probability is then used to generate further outputs.

2.

发明授权
System and method for joint optimization of language model performance and size 有权
标题翻译：联合优化语言模型性能和尺寸的系统和方法

公开(公告)号：US07275029B1

公开(公告)日：2007-09-25

申请号：US09607786

申请日：2000-06-30

申请人： Jianfeng Gao , Kai-Fu Lee , Mingjing Li , Hai-Feng Wang , Dong-Feng Cai , Lee-Feng Chien

发明人： Jianfeng Gao , Kai-Fu Lee , Mingjing Li , Hai-Feng Wang , Dong-Feng Cai , Lee-Feng Chien

IPC分类号： G06F17/27

CPC分类号： G06F17/2735 , G06F17/274 , G06F17/2818

摘要： A method for the joint optimization of language model performance and size is presented comprising developing a language model from a tuning set of information, segmenting at least a subset of a received textual corpus and calculating a perplexity value for each segment and refining the language model with one or more segments of the received corpus based, at least in part, on the calculated perplexity value for the one or more segments.

摘要翻译： 提出了一种用于联合优化语言模型性能和大小的方法，包括从调整的信息集开发语言模型，分割所接收的文本语料库的至少一个子集，并计算每个分段的困惑度值，并用至少部分地基于所计算的一个或多个段的困惑度值，所接收的语料库的一个或多个段。

3.

发明授权
Method and apparatus for distribution-based language model adaptation 有权
标题翻译：基于分布式语言模型适应的方法和装置

公开(公告)号：US07254529B2

公开(公告)日：2007-08-07

申请号：US11225543

申请日：2005-09-13

申请人： Jianfeng Gao , Mingjing Li

发明人： Jianfeng Gao , Mingjing Li

IPC分类号： G06F17/27 , G06F17/28 , G10L15/00

CPC分类号： G06F17/2715 , G10L15/065 , G10L15/1815

摘要： A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

摘要翻译： 提供了一种用于使语言模型适应于任务特定领域的方法和装置。在该方法和装置下，小训练集中的n-gram的相对频率（即任务特定的训练数据集）和大训练集中的n-gram的相对频率（即，域外训练数据集）用于在大训练集中加权n-g的分布计数。然后通过从加权分布中识别n克的概率，将加权分布用于形成修改后的语言模型。

4.

发明申请
Method and apparatus for distribution-based language model adaptation 有权

公开(公告)号：US20060009965A1

公开(公告)日：2006-01-12

申请号：US11225543

申请日：2005-09-13

申请人： Jianfeng Gao , Mingjing Li

发明人： Jianfeng Gao , Mingjing Li

IPC分类号： G06F17/27

CPC分类号： G06F17/2715 , G10L15/065 , G10L15/1815

摘要： A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

5.

发明授权
Language input system for mobile devices 有权
标题翻译：移动设备语言输入系统

公开(公告)号：US07277732B2

公开(公告)日：2007-10-02

申请号：US09843358

申请日：2001-04-24

申请人： Zheng Chen , Mingjing Li , Feng Zhang , Rui Yang , Jianfeng Gao

发明人： Zheng Chen , Mingjing Li , Feng Zhang , Rui Yang , Jianfeng Gao

IPC分类号： A04B1/38

CPC分类号： G06F3/0236 , G06F3/018 , G06F3/0237 , H04M1/72519 , H04M2250/58 , H04M2250/70

摘要： A language system facilitates entry of an input string into a mobile device using discrete keys on a keypad, such as a 10-key keypad. The numeric keys have associated letters of an alphabet. The key input is representative of one or more Chinese phonetic characters. Based on this input string, the language system derives the most likely Chinese corresponding language characters intended by the user. The language system uses multiple different search engines and language models to aid in deriving the most probable Chinese language characters. When the language system recognizes possible Chinese language characters, the mobile device displays the possible Chinese language characters for user selection of the possible Chinese language characters and/or further input of one or more Chinese phonetic characters. In this manner, the language system adopts a modeless entry methodology that eliminates conventional mode switching between input and selection operations.

摘要翻译： 语言系统有助于使用键盘上的离散键（诸如10键键盘）将输入串输入到移动设备中。数字键具有字母的相关字母。关键输入是一个或多个汉语拼音字符的代表。基于该输入字符串，语言系统导出用户想要的最可能的中文对应语言字符。语言系统使用多种不同的搜索引擎和语言模型来帮助推导出最可能的中文字符。当语言系统识别可能的中文字符时，移动设备显示可能的汉语字符，用于选择可能的中文字符和/或进一步输入一个或多个汉语拼音字符。以这种方式，语言系统采用无模式输入方法，消除了输入和选择操作之间的常规模式切换。

6.

发明授权
Method and apparatus for distribution-based language model adaptation 失效

公开(公告)号：US07043422B2

公开(公告)日：2006-05-09

申请号：US09945930

申请日：2001-09-04

申请人： Jianfeng Gao , Mingjing Li

发明人： Jianfeng Gao , Mingjing Li

IPC分类号： G06F17/27

CPC分类号： G06F17/2715 , G10L15/065 , G10L15/1815

摘要： A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

7.

发明授权
Structured cross-lingual relevance feedback for enhancing search results 有权
标题翻译：结构化的跨语言相关性反馈，以增强搜索结果

公开(公告)号：US08645289B2

公开(公告)日：2014-02-04

申请号：US12970879

申请日：2010-12-16

申请人： Paul Nathan Bennett , Jianfeng Gao , Jagadeesh Jagarlamudi , Kristen Patricia Parton

发明人： Paul Nathan Bennett , Jianfeng Gao , Jagadeesh Jagarlamudi , Kristen Patricia Parton

IPC分类号： G06F15/18

CPC分类号： G06F17/30669 , G06F17/30675

摘要： A “Cross-Lingual Unified Relevance Model” provides a feedback model that improves a machine-learned ranker for a language with few training resources, using feedback from a more complete ranker for a language that has more training resources. The model focuses on linguistically non-local queries, such as “world cup” (English language/U.S. market) and “copa mundial” (Spanish language/Mexican market), that have similar user intent in different languages and markets or regions, thus allowing the low-resource ranker to receive direct relevance feedback from the high-resource ranker. Among other things, the Cross-Lingual Unified Relevance Model differs from conventional relevancy-based techniques by incorporating both query- and document-level features. More specifically, the Cross-Lingual Unified Relevance Model generalizes existing cross-lingual feedback models, incorporating both query expansion and document re-ranking to further amplify the signal from the high-resource ranker to enable a learning to rank approach based on appropriately labeled training data.

摘要翻译： “跨语言统一相关性模型”提供了一种反馈模型，可以为少数培训资源的语言改进机器学习游戏者，使用更完整的游戏者的反馈来获得具有更多培训资源的语言。该模式侧重于语言上的非本地查询，例如“世界杯”（英语/美国市场）和“复合世界”（西班牙语/墨西哥市场），在不同语言和市场或区域具有类似的用户意图，因此允许低资源游击队员接收来自高资源队员的直接相关反馈。其中，跨语言统一相关性模型与传统的相关性技术不同，包括查询和文档级功能。更具体地说，跨语言统一相关性模型概括了现有的跨语言反馈模型，其中包括查询扩展和文档重新排序，以进一步放大来自高资源游戏者的信号，以使学习能够基于适当标记的训练进行排名数据。

8.

发明申请
Enhanced Query Rewriting Through Statistical Machine Translation 有权
标题翻译：通过统计机器翻译增强查询重写

公开(公告)号：US20120254218A1

公开(公告)日：2012-10-04

申请号：US13078648

申请日：2011-04-01

申请人： Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari

发明人： Alnur Ali , Jianfeng Gao , Xiaodong He , Bodo von Billerbeck , Sanaz Ahari

IPC分类号： G06F17/30

CPC分类号： G06F17/30672

摘要： Systems, methods, and computer media for identifying query rewriting replacement terms are provided. A list of related string pairs each comprising a first string and second string is received. The first string of each related string pair is a user search query extracted from user click log data. For one or more of the related string pairs, the string pair is provided as inputs to a statistical machine translation model. The model identifies one or more pairs of corresponding terms, each pair of corresponding terms including a first term from the first string and a second term from the second string. The model also calculates a probability of relatedness for each of the one or more pairs of corresponding terms. Term pairs whose calculated probability of relatedness exceeds a threshold are characterized as query term replacements and incorporated, along with the probability of relatedness, into a query rewriting candidate database.

摘要翻译： 提供了用于识别查询重写替换术语的系统，方法和计算机媒体。接收包括第一串和第二串的相关字符串对的列表。每个相关字符串对的第一个字符串是从用户点击日志数据中提取的用户搜索查询。对于一个或多个相关字符串对，字符串对作为统计机器翻译模型的输入提供。该模型识别一对或多对对应的术语，每对对应的术语包括来自第一个字符串的第一项和来自第二个字符串的第二个项。该模型还计算一对或多对相应项中的每一对的相关概率。其相关性概率超过阈值的术语对被表征为查询词替换，并将其与相关性的概率一起并入查询重写候选数据库中。

9.

发明申请
DEPENDENCY-BASED QUERY EXPANSION ALTERATION CANDIDATE SCORING 有权
标题翻译：基于依赖性的查询扩展替换候选评分

公开(公告)号：US20120131031A1

公开(公告)日：2012-05-24

申请号：US12951068

申请日：2010-11-22

申请人： Shasha Xie , Xiaodong He , Jianfeng Gao

发明人： Shasha Xie , Xiaodong He , Jianfeng Gao

IPC分类号： G06F17/30

CPC分类号： G06F17/30967 , G06F17/30672

摘要： An alteration candidate for a query can be scored. The scoring may include computing one or more query-dependent feature scores and/or one or more intra-candidate dependent feature scores. The computation of the query-dependent feature score(s) can be based on dependencies to multiple query terms from each of one or more alteration terms (i.e., for each of the one or more alteration terms, there can be dependencies to multiple query terms that form at least a portion of the basis for the query-dependent feature score(s)). The computation of the intra-candidate dependent feature score(s) can be based on dependencies between different terms in the alteration candidate. A candidate score can be computed using the query dependent feature score(s) and/or the intra-candidate dependent feature score(s). Additionally, the candidate score can be used in determining whether to select the candidate to expand the query. If selected, the candidate can be used to expand the query.

摘要翻译： 可以对查询的变更候选进行评分。评分可以包括计算一个或多个依赖于查询的特征得分和/或一个或多个候选内相关特征得分。依赖于查询的特征得分的计算可以基于来自一个或多个改变项中的每一个的多个查询词的依赖性（即，对于一个或多个改变术语中的每一个，可以依赖于多个查询术语其形成用于查询相关特征得分的基础的至少一部分）。候选者相关特征得分的计算可以基于变更候选者中不同术语之间的依赖关系。可以使用查询相关特征得分和/或候选内相关特征得分来计算候选分数。此外，可以使用候选分数来确定是否选择候选来扩展查询。如果选择，候选人可以用来扩展查询。

10.

发明授权
HMM alignment for combining translation systems 有权
标题翻译：用于组合翻译系统的HMM对齐

公开(公告)号：US08060358B2

公开(公告)日：2011-11-15

申请号：US12147807

申请日：2008-06-27

申请人： Xiaodong He , Mei Yang , Jianfeng Gao , Patrick Nguyen

发明人： Xiaodong He , Mei Yang , Jianfeng Gao , Patrick Nguyen

IPC分类号： G06F17/28

CPC分类号： G06F17/2827 , G06F17/2818

摘要： A computing system configured to produce an optimized translation hypothesis of text input into the computing system. The computing system includes a plurality of translation machines. Each of the translation machines is configured to produce their own translation hypothesis from the same text. An optimization machine is connected to the plurality of translation machines. The optimization machine is configured to receive the translation hypotheses from the translation machines. The optimization machine is further configured to align, word-to-word, the hypotheses in the plurality of hypotheses by using a hidden Markov model.

摘要翻译： 一种计算系统，被配置为产生文本输入到所述计算系统中的优化翻译假说。计算系统包括多个翻译机。每个翻译机被配置为从相同的文本产生他们自己的翻译假设。优化机连接到多台翻译机。优化机被配置为从翻译机接收翻译假说。优化机还被配置为通过使用隐马尔科夫模型来对齐单词到多个假设中的假设。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类