Sampling training data for an automatic speech recognition system based on a benchmark classification distribution

发明授权

US08374865B1 Sampling training data for an automatic speech recognition system based on a benchmark classification distribution 有权

请登陆查看更多内容

专利标题： Sampling training data for an automatic speech recognition system based on a benchmark classification distribution
申请号： US13456671

申请日： 2012-04-26
公开(公告)号： US08374865B1

公开(公告)日： 2013-02-12
发明人: Fadi Biadsy , Pedro J. Moreno Mengibar , Kaisuke Nakajima , Daniel Martin Bikel
申请人： Fadi Biadsy , Pedro J. Moreno Mengibar , Kaisuke Nakajima , Daniel Martin Bikel
申请人地址： US CA Mountain View
专利权人： Google Inc.
当前专利权人： Google Inc.
当前专利权人地址： US CA Mountain View
代理机构： McDonnell Boehnen Hulbert & Berghoff LLP
主分类号： G10L15/06
IPC分类号： G10L15/06 ; G10L15/08

Sampling training data for an automatic speech recognition system based on a benchmark classification distribution

摘要：

A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）