发明申请
- 专利标题: Apparatus and methods for aligning words in bilingual sentences
- 专利标题(中): 双语句子对齐词的装置和方法
-
申请号: US11137590申请日: 2005-05-26
-
公开(公告)号: US20060190241A1公开(公告)日: 2006-08-24
- 发明人: Cyril Goutte , Michel Simard , Kenji Yamada , Eric Gaussier , Arne Mauser
- 申请人: Cyril Goutte , Michel Simard , Kenji Yamada , Eric Gaussier , Arne Mauser
- 专利权人: Xerox Corporation
- 当前专利权人: Xerox Corporation
- 主分类号: G06F17/28
- IPC分类号: G06F17/28
摘要:
Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.