发明申请
- 专利标题: LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS
- 专利标题(中): 多媒体语言语言分段
-
申请号: US14073036申请日: 2013-11-06
-
公开(公告)号: US20140067365A1公开(公告)日: 2014-03-06
- 发明人: Anthony Aue
- 申请人: Anthony Aue
- 申请人地址: US WA Redmond
- 专利权人: MICROSOFT CORPORATION
- 当前专利权人: MICROSOFT CORPORATION
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F17/28
- IPC分类号: G06F17/28
摘要:
The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.
公开/授权文献
- US09400787B2 Language segmentation of multilingual texts 公开/授权日:2016-07-26