Sequence classification for machine translation

Invention Grant

US07783473B2 Sequence classification for machine translation 失效

Title translation: 机器翻译序列分类

Please log in to see more content

Patent Title: Sequence classification for machine translation
Patent Title (中): 机器翻译序列分类
Application No.: US11647080

Application Date: 2006-12-28
Publication No.: US07783473B2

Publication Date: 2010-08-24
Inventor: Srinivas Bangalore , Patrick Haffner , Stephan Kanthak
Applicant: Srinivas Bangalore , Patrick Haffner , Stephan Kanthak
Applicant Address: US NV Reno
Assignee: AT&T Intellectual Property II, L.P.
Current Assignee: AT&T Intellectual Property II, L.P.
Current Assignee Address: US NV Reno
Agent Ronald D. Slusky
Main IPC: G06F17/28
IPC: G06F17/28 ; G10L21/00

Sequence classification for machine translation

Abstract:

Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.

Abstract(Chinese):

使用独立假设进行序列分类，如自然语言句子的翻译。独立性假设是将源语句正确翻译成特定目标句子词的概率与句子中其他单词的翻译无关的假设。尽管这种假设不是正确的，但仍然会实现高水平的字翻译精度。特别地，歧视性训练被用于基于训练句子中相应源词的一组特征来开发每个目标词汇词的模型，其中至少一个与源词的上下文有关的特征。每个模型包括对应的目标词汇单词的权重向量。包括向量的权重与相应的特征相关联; 每个权重是衡量源字符的该特征的存在程度使得所述目标词更可能是正确的。

Public/Granted literature

US20080162111A1 Sequence classification for machine translation Public/Granted day:2008-07-03

Information query

Espacenet