Sampling training data for an automatic speech recognition system based on a benchmark classification distribution
    1.
    发明授权
    Sampling training data for an automatic speech recognition system based on a benchmark classification distribution 有权
    基于基准分类分布的自动语音识别系统的采样训练数据

    公开(公告)号:US09202461B2

    公开(公告)日:2015-12-01

    申请号:US13745295

    申请日:2013-01-18

    Applicant: Google Inc.

    CPC classification number: G10L15/063 G10L15/183

    Abstract: A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.

    Abstract translation: 一组基准文本字符串可以分类为提供一组基准分类。 集合中的基准文本字符串可以对应于特定语言的基准语音的基准语料库。 可以确定基准分类集合的基准分类分布。 也可以确定文本字符串的语料库中的每个文本字符串的相应分类。 来自文本字符串语料库的文本字符串可以被采样以形成训练文本串的训练语料库,使得训练文本串的分类具有基于基准分类分布的训练文本串分类分布。 训练文本字符串的训练语料库可用于训练自动语音识别(ASR)系统。

Patent Agency Ranking