发明授权
US09460708B2 Automated data cleanup by substitution of words of the same pronunciation and different spelling in speech recognition 有权
通过替换相同发音和语音识别中不同拼写的单词进行自动数据清理

Automated data cleanup by substitution of words of the same pronunciation and different spelling in speech recognition
摘要:
The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary. The transducer can process speech recognition data in some cases by substituting a second word for a first word which shares pronunciation with the first word but is spelled differently. In some cases, this can be accomplished by establishing corresponding probabilities of the first word and second word based on a third word that appears in sequence with the first word.
公开/授权文献
信息查询
0/0