-
公开(公告)号:US20150073804A1
公开(公告)日:2015-03-12
申请号:US14019967
申请日:2013-09-06
Applicant: Google Inc.
Inventor: Andrew W. Senior , Javier Gonzalvo Fructuoso
IPC: G10L13/027
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于基于资源中的结构化数据提供表示。 方法,系统和装置包括接收从神经网络输出的目标声学特征的动作,所述神经网络已被训练以预测具有语言特征的声学特征。 附加动作包括确定目标声学特征与存储的声学样本的声学特征之间的距离。 进一步的动作包括至少基于所确定的距离来选择要在语音合成中使用的声学样本,并且基于所选择的声学样本来合成语音。
-
公开(公告)号:US10019985B2
公开(公告)日:2018-07-10
申请号:US14258139
申请日:2014-04-22
Applicant: Google Inc.
Inventor: Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183
CPC classification number: G10L15/063 , G06N3/0454 , G10L15/16 , G10L15/183
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US20170330558A1
公开(公告)日:2017-11-16
申请号:US15664153
申请日:2017-07-31
Applicant: Google Inc.
Inventor: Hasim Sak , Andrew W. Senior
CPC classification number: G10L15/16 , G10L15/02 , G10L15/142 , G10L2015/025
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
-
公开(公告)号:US09786270B2
公开(公告)日:2017-10-10
申请号:US15205263
申请日:2016-07-08
Applicant: Google Inc.
Inventor: Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao
IPC: G10L15/06 , G10L15/16 , G10L15/187
CPC classification number: G10L15/063 , G10L15/16 , G10L15/187
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.
-
公开(公告)号:US09620108B2
公开(公告)日:2017-04-11
申请号:US14557725
申请日:2014-12-02
Applicant: Google Inc.
Inventor: Hasim Sak , Andrew W. Senior
CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/08 , G10L15/12 , G10L15/142 , G10L2015/025
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.
-
公开(公告)号:US20170011738A1
公开(公告)日:2017-01-12
申请号:US15205263
申请日:2016-07-08
Applicant: Google Inc.
Inventor: Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao
CPC classification number: G10L15/063 , G10L15/16 , G10L15/187
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于产生声学模型的计算机程序。 在一些实现中,获得了使用连接时间分类算法训练为声学模型的第一神经网络。 获得来自第一神经网络的输出分布用于发音。 第二神经网络被训练为使用由第一神经网络产生的输出分布作为第二神经网络的输出目标的声学模型。 提供了一种被配置为使用训练有素的第二神经网络的自动语音识别器。
-
公开(公告)号:US20150186359A1
公开(公告)日:2015-07-02
申请号:US14143627
申请日:2013-12-30
Applicant: Google Inc.
Inventor: Javier Gonzalvo Fructuoso , Andrew W. Senior , Byungha Chun
IPC: G06F17/28
CPC classification number: G10L13/10 , G06F17/289 , G10L13/07 , G10L13/08 , G10L13/086 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于多语言韵律生成。 在一些实现中,获得指示与文本相对应的一组语言特征的数据。 指示语言特征的数据和指示文本语言的数据被提供给已经被训练以提供指示多种语言的韵律信息的输出的神经网络的输入。 神经网络可以是已经使用多种语言的语音训练的神经网络。 从神经网络接收到表示语言特征的韵律信息的输出。 使用神经网络的输出生成表示文本的音频数据。
-
公开(公告)号:US20150170640A1
公开(公告)日:2015-06-18
申请号:US14559113
申请日:2014-12-03
Applicant: Google Inc.
Inventor: Hasim Sak , Andrew W. Senior
IPC: G10L15/16 , G10L15/187
CPC classification number: G10L15/16 , G10L15/02 , G10L15/142 , G10L2015/025
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于产生声学序列的表示。 方法之一包括:接收声学序列,声学序列包括在多个时间步长中的每一个处的相应的声学特征表示; 使用声学建模神经网络在初始时间步骤处理声学特征表示; 对于所述多个时间步骤中的每个随后的时间步长:接收由所述声学建模神经网络生成的用于前一时间步长的输出,从由所述声学建模神经网络为前一时间步长产生的输出产生修改的输入,并且所述声学 用于时间步长的表示,以及使用声学建模神经网络处理经修改的输入以产生时间步长的输出; 以及从每个时间步长的输出中产生用于发声的音素表示。
-
公开(公告)号:US20150039301A1
公开(公告)日:2015-02-05
申请号:US13955483
申请日:2013-07-31
Applicant: Google Inc.
Inventor: Andrew W. Senior , Ignacio L. Moreno
IPC: G10L15/16
CPC classification number: G10L15/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用神经网络的语音识别。 接收对话音的一部分的音频特征进行建模的特征向量。 收到指示多元因素分析的潜在变量的数据。 特征向量和指示潜变量的数据被提供给神经网络的输入。 基于至少神经网络的输出确定用于话语的候选转录。
-
公开(公告)号:US09905220B2
公开(公告)日:2018-02-27
申请号:US14942300
申请日:2015-11-16
Applicant: Google Inc.
Inventor: Javier Gonzalvo Fructuoso , Andrew W. Senior , Byungha Chun
CPC classification number: G10L13/10 , G06F17/289 , G10L13/07 , G10L13/08 , G10L13/086 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
-
-
-
-
-
-
-
-
-