发明授权
US09460708B2 Automated data cleanup by substitution of words of the same pronunciation and different spelling in speech recognition
有权
通过替换相同发音和语音识别中不同拼写的单词进行自动数据清理
- 专利标题: Automated data cleanup by substitution of words of the same pronunciation and different spelling in speech recognition
- 专利标题(中): 通过替换相同发音和语音识别中不同拼写的单词进行自动数据清理
-
申请号: US12561521申请日: 2009-09-17
-
公开(公告)号: US09460708B2公开(公告)日: 2016-10-04
- 发明人: Geoffrey Zweig , Yun-Cheng Ju
- 申请人: Geoffrey Zweig , Yun-Cheng Ju
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人地址: US WA Redmond
- 代理商 Alin Corie; Sandy Swain; Micky Minhas
- 主分类号: G06F17/20
- IPC分类号: G06F17/20 ; G06F17/27 ; G10L15/06 ; G10L15/187
摘要:
The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary. The transducer can process speech recognition data in some cases by substituting a second word for a first word which shares pronunciation with the first word but is spelled differently. In some cases, this can be accomplished by establishing corresponding probabilities of the first word and second word based on a third word that appears in sequence with the first word.
公开/授权文献
- US20100076752A1 Automated Data Cleanup 公开/授权日:2010-03-25
信息查询