CONTEXT-DEPENDENT MODELING OF PHONEMES
    1.
    发明申请
    CONTEXT-DEPENDENT MODELING OF PHONEMES 有权
    语音相关依赖建模

    公开(公告)号:US20160372118A1

    公开(公告)日:2016-12-22

    申请号:US14877673

    申请日:2015-10-07

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

    Abstract translation: 方法,系统和装置,包括在用于建模音素的计算机存储介质上编码的计算机程序。 一种方法包括:在多个时间步骤的每个步骤处接收声学序列,表示话语的声学序列,以及包括相应的声学特征表示的声学序列; 对于所述多个时间步骤中的每个步骤:通过一个或多个循环神经网络层中的每一个处理所述声学特征表示以产生复现输出; 使用softmax输出层处理复现输出以产生一组分数,该分数集合包括多个上下文相关词汇表音素中的每一个的相应分数,每个上下文相关词汇语音的分数表示上下文相关的可能性 词汇音素代表时间步长的话语; 以及从所述多个时间步长的得分确定所述序列的上下文相关音素表示。

    Context-dependent modeling of phonemes

    公开(公告)号:US09818409B2

    公开(公告)日:2017-11-14

    申请号:US14877673

    申请日:2015-10-07

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

Patent Agency Ranking