-
公开(公告)号:US20250078809A1
公开(公告)日:2025-03-06
申请号:US18951397
申请日:2024-11-18
Applicant: Google LLC
Inventor: Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:US20250021889A1
公开(公告)日:2025-01-16
申请号:US18897967
申请日:2024-09-26
Applicant: Google LLC
Inventor: Zhifeng Chen , Michael Schuster , Melvin Jose Johnson Premkumar , Yonghui Wu , Quoc V. Le , Maxim Krikun , Thorsten Brants
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing machine learning tasks. One method includes receiving (i) a model input, and (ii) data identifying a first machine learning task to be performed on the model input to generate a first type of model output for the model input; augmenting the model input with an identifier for the first machine learning task to generate an augmented model input; and processing the augmented model input using a machine learning model, wherein the machine learning model has been trained on training data to perform a plurality of machine learning tasks including the first machine learning task, and wherein the machine learning model has been configured through training to process the augmented model input to generate a machine learning model output of the first type for the model input.
-
公开(公告)号:US20240378441A1
公开(公告)日:2024-11-14
申请号:US18661447
申请日:2024-05-10
Applicant: Google LLC
Inventor: Slav Petrov , Yonghui Wu , Andrew M. Dai , David Richard So , Dmitry Lepikhin , Erica Ann Moreira , Gaurav Mishra , Jonathan Hudson Clark , Maxim Krikun , Melvin Jose Johnson Premkumar , Nan Du , Orhan Firat , Rohan Anil , Siamak Shakeri , Xavier Garcia , Yanping Huang , Yong Cheng , Yuanzhong Xu , Yujing Zhang , Zachary Alexander Nado , Eric Jun Jie Ni , Kefan Xiao , Vladimir Feinberg , Jin Young Sohn , Aurko Roy
IPC: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.
-
公开(公告)号:US12106749B2
公开(公告)日:2024-10-01
申请号:US17448119
申请日:2021-09-20
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/30 , G10L2015/025 , G10L15/26
Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
公开(公告)号:US20240104352A1
公开(公告)日:2024-03-28
申请号:US18012391
申请日:2022-07-28
Applicant: Google LLC
Inventor: Yu Zhang , Yu-An Chung , Wei Han , Chung-Cheng Chiu , Weikeng Qin , Ruoming Pang , Yonghui Wu
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Provided are improved end-to-end self-supervised pre-training frameworks that leverage a combination of contrastive and masked modeling loss terms. In particular, the present disclosure provides framework that combines contrastive learning and masked modeling, where the former trains the model to discretize input data (e.g., continuous signals such as continuous speech signals) into a finite set of discriminative tokens, and the latter trains the model to learn contextualized representations via solving a masked prediction task consuming the discretized tokens. In contrast to certain existing masked modeling-based pre-training frameworks which rely on an iterative re-clustering and re-training process or other existing frameworks which concatenate two separately trained modules, the proposed framework can enable a model to be optimized in an end-to-end fashion by solving the two self-supervised tasks (the contrastive task and masked modeling) simultaneously.
-
公开(公告)号:US11848002B2
公开(公告)日:2023-12-19
申请号:US17813361
申请日:2022-07-19
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen
CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L2013/021
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:US20230385543A1
公开(公告)日:2023-11-30
申请号:US18447186
申请日:2023-08-09
Applicant: Google LLC
Inventor: Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay LU , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Greg Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammour Al-Rfou'
IPC: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/62 , G06F3/023 , G06F40/30 , G06F40/232 , G06F40/253 , G06F40/284
CPC classification number: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/6245 , G06F40/284 , G06F40/30 , G06F40/232 , G06F40/253 , G06F3/0237
Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.
-
公开(公告)号:US20230351149A1
公开(公告)日:2023-11-02
申请号:US18141340
申请日:2023-04-28
Applicant: Google LLC
Inventor: Jiahui Yu , Zirui Wang , Vijay Vasudevan , Ho Man Yeung , Seyed Mojtaba Seyedhosseini Tarzjani , Yonghui Wu
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing multi-modal inputs using contrastive captioning neural networks.
-
公开(公告)号:US11755834B2
公开(公告)日:2023-09-12
申请号:US15852916
申请日:2017-12-22
Applicant: Google LLC
Inventor: Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay Lu , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Gregory Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammouf Al-Rfou'
IPC: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/62 , G06F3/023 , G06F40/30 , G06F40/232 , G06F40/253 , G06F40/284
CPC classification number: G06F40/274 , G06F3/0237 , G06F3/04842 , G06F21/6245 , G06F40/232 , G06F40/253 , G06F40/284 , G06F40/30 , G06N20/00
Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.
-
公开(公告)号:US20230252974A1
公开(公告)日:2023-08-10
申请号:US18010438
申请日:2021-09-02
Applicant: Google LLC
Inventor: Byungha Chun , Mohammad Norouzi , Nanxin Chen , Ron J. Weiss , William Chan , Yu Zhang , Yonghui Wu
IPC: G10L13/08 , G10L21/0208
CPC classification number: G10L13/08 , G10L21/0208
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating waveforms conditioned on phoneme sequences. In one aspect, a method comprises: obtaining a phoneme sequence; processing the phoneme sequence using an encoder neural network to generate a hidden representation of the phoneme sequence; generating, from the hidden representation, a conditioning input; initializing a current waveform output; and generating a final waveform output that defines an utterance of the phoneme sequence by a speaker by updating the current waveform output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing (i) the current waveform output and (ii) the conditioning input using a noise estimation neural network to generate a noise output; and updating the current waveform output using the noise output and the noise level for the iteration.
-
-
-
-
-
-
-
-
-