发明申请
US20140067365A1 LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS 有权
多媒体语言语言分段

  • 专利标题: LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS
  • 专利标题(中): 多媒体语言语言分段
  • 申请号: US14073036
    申请日: 2013-11-06
  • 公开(公告)号: US20140067365A1
    公开(公告)日: 2014-03-06
  • 发明人: Anthony Aue
  • 申请人: Anthony Aue
  • 申请人地址: US WA Redmond
  • 专利权人: MICROSOFT CORPORATION
  • 当前专利权人: MICROSOFT CORPORATION
  • 当前专利权人地址: US WA Redmond
  • 主分类号: G06F17/28
  • IPC分类号: G06F17/28
LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS
摘要:
The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.
公开/授权文献
信息查询
0/0