Context-dependent modeling of phonemes

    公开(公告)号:US09818409B2

    公开(公告)日:2017-11-14

    申请号:US14877673

    申请日:2015-10-07

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

    Speech transcription including written text
    2.
    发明授权
    Speech transcription including written text 有权
    言语转录包括书面文本

    公开(公告)号:US09594744B2

    公开(公告)日:2017-03-14

    申请号:US13829482

    申请日:2013-03-14

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.

    Abstract translation: 公开了包括在计算机存储介质上编码的用于将话语转换成书面文本的计算机程序的方法,系统和装置。 方法,系统和装置包括获取将电话映射到口语文本并获得将概率分配给书写文本的语言模型的词典模型的动作。 还包括生成将书写文本映射到口语文本的传感器,换能器将多个文本文本项目映射到口语文本的项目。 此外,这些动作包括通过组合词典模型,换能器的倒数和语言模型来构建用于将话语转录成书写文本的解码网络。

    Written-domain language modeling with decomposition
    3.
    发明授权
    Written-domain language modeling with decomposition 有权
    书面域语言建模与分解

    公开(公告)号:US09460088B1

    公开(公告)日:2016-10-04

    申请号:US13906654

    申请日:2013-05-31

    Applicant: Google Inc.

    CPC classification number: G06F17/2881 G06F17/2765 G10L15/19

    Abstract: An automatic speech recognition system and method are provided for written-domain language modeling. According to one implementation, a process includes accessing decomposed training data that results from applying rewrite grammar rules to original training data, the decomposed training data comprising (i) regular words from the original training data that have not been rewritten using the set of rewrite grammar rules, and (ii) decomposed segments that result from rewriting non-lexical entities from the original training data using the rewrite grammar rules, generating a restriction model that (i) maps language model paths for regular words to themselves, and (ii) restricts language model paths for decomposed segments for non-lexical entities, training a n-gram language model over the training data, composing the restriction model and the language model to obtain a restricted language model, and constructing a decoding network by composing a context dependency model and a pronunciation lexicon with the restricted language model.

    Abstract translation: 提供了一种用于书面域语言建模的自动语音识别系统和方法。 根据一个实施方式,一个过程包括访问由重写语法规则应用于原始训练数据而产生的分解的训练数据,分解的训练数据包括(i)来自原始训练数据的常规单词,该原始训练数据未被重写使用该组重写语法 规则,和(ii)使用重写语法规则从原始训练数据重写非词汇实体产生的分段,生成限制模型,其将(i)将常规单词的语言模型路径映射到自身,以及(ii)限制 用于非词汇实体的分解段的语言模型路径,训练训练数据上的n-gram语言模型,组成限制模型和语言模型以获得受限语言模型,以及通过组合上下文依赖模型构建解码网络 和具有受限语言模型的发音词典。

    IDENTIFYING THE LANGUAGE OF A SPOKEN UTTERANCE
    4.
    发明申请
    IDENTIFYING THE LANGUAGE OF A SPOKEN UTTERANCE 审中-公开
    识别语言的语言

    公开(公告)号:US20160035344A1

    公开(公告)日:2016-02-04

    申请号:US14817302

    申请日:2015-08-04

    Applicant: Google Inc.

    CPC classification number: G10L15/005 G06N3/0445 G06N3/084 G10L15/16

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving a plurality of audio frames that collectively represent at least a portion of a spoken utterance; processing the plurality of audio frames using a long short term memory (LSTM) neural network to generate a respective language score for each of a plurality of languages, wherein the respective language score for each of the plurality of languages represents a likelihood that the spoken utterance was spoken in the language; and classifying the spoken utterance as being spoken in one of the plurality of languages using the language scores.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于识别口语发音的语言。 其中一种方法包括:接收多个音频帧,它们共同表示说出话语的至少一部分; 使用长的短期存储器(LSTM)神经网络来处理所述多个音频帧以针对多种语言中的每一种产生相应的语言得分,其中所述多种语言中的每一种的相应语言得分表示所述语音发音的可能性 用语言说; 并且使用语言分数将口语说话分类为以多种语言之一说出来。

    LANGUAGE MODELING OF COMPLETE LANGUAGE SEQUENCES
    5.
    发明申请
    LANGUAGE MODELING OF COMPLETE LANGUAGE SEQUENCES 有权
    完整语言序列的语言建模

    公开(公告)号:US20140278407A1

    公开(公告)日:2014-09-18

    申请号:US13875406

    申请日:2013-05-02

    Applicant: Google Inc.

    CPC classification number: G10L15/063 G10L15/197

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于完整语言序列的语言建模。 访问指示语言序列的训练数据,并且确定训练数据中出现每个语言序列多次的计数。 选择语言序列的适当子集,并训练语言模型的第一个组成部分。 第一组件包括用于将分数分配给所选择的语言序列的第一概率数据。 基于训练数据训练语言模型的第二组件,其中第二组件包括用于将分数分配给不包括在所选语言序列中的语言序列的第二概率数据。 生成相对于第一概率数据归一化第二概率数据的调整数据,并且存储第一分量,第二分量和调整数据。

    Speech recognition with acoustic models

    公开(公告)号:US09818410B2

    公开(公告)日:2017-11-14

    申请号:US14983315

    申请日:2015-12-29

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

    SPEECH RECOGNITION WITH ACOUSTIC MODELS
    8.
    发明申请
    SPEECH RECOGNITION WITH ACOUSTIC MODELS 有权
    用声学模型进行语音识别

    公开(公告)号:US20160372119A1

    公开(公告)日:2016-12-22

    申请号:US14983315

    申请日:2015-12-29

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: sub sampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于从声学序列学习发音的计算机程序。 一种方法包括:在多个时间步长中的每个步骤处接收声学序列,代表发音的声学序列,以及包括多个声学数据帧序列的声学序列; 堆叠一个或多个声音数据帧以产生声学数据的修改帧序列; 通过包括一个或多个循环神经网络(RNN)层和最终CTC输出层的声学建模神经网络来处理声学数据的经修改的帧序列以产生神经网络输出,其中处理声学数据的经修改的帧序列包括 :对声学数据的修改帧进行子采样; 并通过声学建模神经网络处理每个子采样的声学数据的修改帧。

    GENERATING REPRESENTATIONS OF INPUT SEQUENCES USING NEURAL NETWORKS
    9.
    发明申请
    GENERATING REPRESENTATIONS OF INPUT SEQUENCES USING NEURAL NETWORKS 审中-公开
    使用神经网络生成输入序列的表示

    公开(公告)号:US20150356075A1

    公开(公告)日:2015-12-10

    申请号:US14728875

    申请日:2015-06-02

    Applicant: Google Inc.

    CPC classification number: G06N3/0445

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes receiving a grapheme sequence, the grapheme sequence comprising a plurality of graphemes arranged according to an input order; processing the sequence of graphemes using a long short-term memory (LSTM) neural network to generate an initial phoneme sequence from the grapheme sequence, the initial phoneme sequence comprising a plurality of phonemes arranged according to an output order; and generating a phoneme representation of the grapheme sequence from the initial phoneme sequence generated by the LSTM neural network, wherein generating the phoneme representation comprises removing, from the initial phoneme sequence, phonemes in one or more positions in the output order.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于产生输入序列的表示。 所述方法之一包括接收字母序列,所述字符序列包括根据输入顺序排列的多个字形; 使用长的短期记忆(LSTM)神经网络处理字符序列以从图形序列生成初始音素序列,所述初始音素序列包括根据输出顺序排列的多个音素; 以及从由LSTM神经网络生成的初始音素序列生成字形序列的音素表示,其中产生音素表示包括从初始音素序列去除输出顺序中的一个或多个位置中的音素。

    RECOGNIZING SPEECH USING NEURAL NETWORKS
    10.
    发明申请
    RECOGNIZING SPEECH USING NEURAL NETWORKS 有权
    使用神经网络识别语音

    公开(公告)号:US20150340034A1

    公开(公告)日:2015-11-26

    申请号:US14720113

    申请日:2015-05-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing speech using neural networks. One of the methods includes receiving an audio input; processing the audio input using an acoustic model to generate a respective phoneme score for each of a plurality of phoneme labels; processing one or more of the phoneme scores using an inverse pronunciation model to generate a respective grapheme score for each of a plurality of grapheme labels; and processing one or more of the grapheme scores using a language model to generate a respective text label score for each of a plurality of text labels.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用神经网络识别语音。 其中一种方法包括接收音频输入; 使用声学模型处理音频输入以为多个音素标签中的每一个产生相应的音素分数; 使用反向发音模型处理一个或多个音素得分,以产生多个图形标签中的每一个的各自的图形分数; 以及使用语言模型处理一个或多个所述图形分数,以生成多个文本标签中的每一个的相应文本标签分数。

Patent Agency Ranking