-
公开(公告)号:WO2019222591A1
公开(公告)日:2019-11-21
申请号:PCT/US2019/032815
申请日:2019-05-17
Applicant: GOOGLE LLC
Inventor: JIA, Ye , CHEN, Zhifeng , WU, Yonghui , SHEN, Jonathan , PANG, Ruoming , WEISS, Ron J. , MORENO, Ignacio Lopez , REN, Fei , ZHANG, Yu , WANG, Quan , NGUYEN, Patrick An Phu
IPC: G10L13/033 , G10L13/04 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:WO2018183650A2
公开(公告)日:2018-10-04
申请号:PCT/US2018/025101
申请日:2018-03-29
Applicant: GOOGLE LLC
Inventor: BENGIO, Samuel , WANG, Yuxuan , YANG, Zongheng , CHEN, Zhifeng , WU, Yonghui , AGIOMYRGIANNAKIS, Ioannis , WEISS, Ron J. , JAITLY, Navdeep , RIFKIN, Ryan M. , CLARK, Robert Andrew James , LE, Quoc V. , RYAN, Russell J. , XIAO, Ying
IPC: G01L13/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:WO2020242662A1
公开(公告)日:2020-12-03
申请号:PCT/US2020/029239
申请日:2020-04-22
Applicant: GOOGLE LLC
Inventor: ZHANG, Yu , WEISS, Ron J. , CHUN, Byungha , WU, Yonghui , CHEN, Zhifeng , SKERRY-RYAN, Russell John Wyatt , JIA, Ye , ROSENBERG, Andrew M. , RAMABHADRAN, Bhuvana
Abstract: A method (300) includes receiving an input text sequence (114) to be synthesized into speech (150) in a first language and obtaining a speaker embedding (116a), the speaker embedding specifying specific voice characteristics of a target speaker (10) for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model (100), an output audio feature representation (119) of the input text sequence by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
-
公开(公告)号:WO2018085577A1
公开(公告)日:2018-05-11
申请号:PCT/US2017/059776
申请日:2017-11-02
Applicant: GOOGLE LLC
Inventor: CHEN, Zhifeng , SCHUSTER, Michael , JOHNSON PREMKUMAR, Melvin Jose , WU, Yonghui , LE, Quoc V. , KRIKUN, Maxim , BRANTS, Thorsten
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing machine learning tasks. One method includes receiving (i) a model input, and (ii) data identifying a first machine learning task to be performed on the model input to generate a first type of model output for the model input; augmenting the model input with an identifier for the first machine learning task to generate an augmented model input; and processing the augmented model input using a machine learning model. An exemplary system applying implicit bridging for machine learning tasks, as described in this specification, trains a machine learning model to perform certain types of machine learning tasks without requiring that explicit training data for the certain types of machine learning tasks to be used during training.
Abstract translation: 包括用于执行机器学习任务的编码在计算机存储介质上的计算机程序的方法,系统和装置。 一种方法包括接收(i)模型输入,以及(ii)标识要在模型输入上执行的第一机器学习任务的数据,以生成用于模型输入的第一类型的模型输出; 用第一机器学习任务的标识符扩充模型输入以生成增强模型输入; 并使用机器学习模型处理增强模型输入。 如本说明书中所描述的,应用机器学习任务的隐式桥接的示例性系统训练机器学习模型以执行某些类型的机器学习任务,而不需要在训练期间使用用于某些类型的机器学习任务的显式训练数据。 p>
-
公开(公告)号:WO2018039510A1
公开(公告)日:2018-03-01
申请号:PCT/US2017/048529
申请日:2017-08-25
Applicant: GOOGLE LLC
Inventor: SCHUSTER, Michael , BENGIO, Samuel , JAITLY, Navdeep , CHEN, Zhifeng , SCHUURMANS, Dale Eric , NOROUZI, Mohammad , WU, Yonghui
IPC: G06N99/00
Abstract: A method includes obtaining data identifying a machine learning model to be trained to perform a machine learning task, the machine learning model being configured to receive an input example and to process the input example in accordance with current values of a plurality of model parameters to generate a model output for the input example; obtaining initial training data for training the machine learning model, the initial training data comprising a plurality of training examples and, for each training example, a ground truth output that should be generated by the machine learning model by processing the training example; generating modified training data from the initial training data; and training the machine learning model on the modified training data.
Abstract translation: 一种方法包括获得识别将被训练以执行机器学习任务的机器学习模型的数据,机器学习模型被配置为接收输入示例并且根据当前值处理输入示例 为多个模型参数中的一个生成用于输入示例的模型输出; 获得用于训练机器学习模型的初始训练数据,所述初始训练数据包括多个训练例子,并且对于每个训练例子,通过处理训练例子应该由机器学习模型产生的地面真实输出; 从初始训练数据生成修改的训练数据; 并在修改后的训练数据上训练机器学习模型。 p>
-
公开(公告)号:WO2022006329A1
公开(公告)日:2022-01-06
申请号:PCT/US2021/039976
申请日:2021-06-30
Applicant: GOOGLE LLC
Inventor: LEPIKHIN, Dmitry , HUANG, Yanping , FIRAT, Orhan , KRIKUN, Maxim , CHEN, Dehao , SHAZEER, Noam M. , LEE, HyoukJoong , XU, Yuanzhong , CHEN, Zhifeng
IPC: G06N3/04 , G06N3/08 , G06N3/0481 , G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer. Some or all of the attention layers have a feed-forward sub-layer that applies conditional computation to the inputs to the sub-layer.
-
公开(公告)号:WO2020205233A1
公开(公告)日:2020-10-08
申请号:PCT/US2020/023169
申请日:2020-03-17
Applicant: GOOGLE LLC
Inventor: JIA, Ye , CHEN, Zhifeng , WU, Yonghui , JOHNSON, Melvin , BIADSY, Fadi , WEISS, Ron , MACHEREY, Wolfgang
IPC: G10L13/033 , G10L13/04 , G10L21/003
Abstract: The present disclosure provides systems and methods that train and use machine-learned models such as, for example, sequence-to-sequence models, to perform direct and text-free speech-to-speech translation. In particular, aspects of the present disclosure provide an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.
-
公开(公告)号:WO2018183650A3
公开(公告)日:2018-10-04
申请号:PCT/US2018/025101
申请日:2018-03-29
Applicant: GOOGLE LLC
Inventor: BENGIO, Samuel , WANG, Yuxuan , YANG, Zongheng , CHEN, Zhifeng , WU, Yonghui , AGIOMYRGIANNAKIS, Ioannis , WEISS, Ron J. , JAITLY, Navdeep , RIFKIN, Ryan M. , CLARK, Robert Andrew James , LE, Quoc V. , RYAN, Russell J. , XIAO, Ying
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:WO2018058046A1
公开(公告)日:2018-03-29
申请号:PCT/US2017/053267
申请日:2017-09-25
Applicant: GOOGLE LLC
Inventor: NOROUZI, Mohammad , CHEN, Zhifeng , WU, Yonghui , SCHUSTER, Michael , LE, Quoc V.
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural machine translation. One of the systems includes an encoder neural network comprising: an input forward long short-term memory (LSTM) layer configured to process each input token in the input sequence in a forward order to generate a respective forward representation of each input token, an input backward LSTM layer configured to process each input token in a backward order to generate a respective backward representation of each input token and a plurality of hidden LSTM layers configured to process a respective combined representation of each of the input tokens in the forward order to generate a respective encoded representation of each of the input tokens; and a decoder subsystem configured to receive the respective encoded representations and to process the encoded representations to generate an output sequence.
Abstract translation: 用于神经机器翻译的方法,系统和装置,包括编码在计算机存储介质上的计算机程序。 其中一个系统包括编码器神经网络,其包括:输入正向长期短期存储器(LSTM)层,其被配置为以正向顺序处理输入序列中的每个输入标记以生成每个输入标记的相应正向表示,输入 后向LSTM层,被配置为以后向顺序处理每个输入令牌以生成每个输入令牌的相应后向表示,以及多个隐藏LSTM层,被配置为以前向顺序处理每个输入令牌的相应组合表示,以生成 各个输入令牌的相应编码表示; 以及解码器子系统,被配置为接收各个编码表示并处理编码表示以生成输出序列。 p>
-
-
-
-
-
-
-
-