-
公开(公告)号:US20240378441A1
公开(公告)日:2024-11-14
申请号:US18661447
申请日:2024-05-10
Applicant: Google LLC
Inventor: Slav Petrov , Yonghui Wu , Andrew M. Dai , David Richard So , Dmitry Lepikhin , Erica Ann Moreira , Gaurav Mishra , Jonathan Hudson Clark , Maxim Krikun , Melvin Jose Johnson Premkumar , Nan Du , Orhan Firat , Rohan Anil , Siamak Shakeri , Xavier Garcia , Yanping Huang , Yong Cheng , Yuanzhong Xu , Yujing Zhang , Zachary Alexander Nado , Eric Jun Jie Ni , Kefan Xiao , Vladimir Feinberg , Jin Young Sohn , Aurko Roy
IPC: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.
-
公开(公告)号:US12106749B2
公开(公告)日:2024-10-01
申请号:US17448119
申请日:2021-09-20
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/30 , G10L2015/025 , G10L15/26
Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
公开(公告)号:US20240104352A1
公开(公告)日:2024-03-28
申请号:US18012391
申请日:2022-07-28
Applicant: Google LLC
Inventor: Yu Zhang , Yu-An Chung , Wei Han , Chung-Cheng Chiu , Weikeng Qin , Ruoming Pang , Yonghui Wu
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Provided are improved end-to-end self-supervised pre-training frameworks that leverage a combination of contrastive and masked modeling loss terms. In particular, the present disclosure provides framework that combines contrastive learning and masked modeling, where the former trains the model to discretize input data (e.g., continuous signals such as continuous speech signals) into a finite set of discriminative tokens, and the latter trains the model to learn contextualized representations via solving a masked prediction task consuming the discretized tokens. In contrast to certain existing masked modeling-based pre-training frameworks which rely on an iterative re-clustering and re-training process or other existing frameworks which concatenate two separately trained modules, the proposed framework can enable a model to be optimized in an end-to-end fashion by solving the two self-supervised tasks (the contrastive task and masked modeling) simultaneously.
-
公开(公告)号:US11848002B2
公开(公告)日:2023-12-19
申请号:US17813361
申请日:2022-07-19
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen
CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L2013/021
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:US20230385543A1
公开(公告)日:2023-11-30
申请号:US18447186
申请日:2023-08-09
Applicant: Google LLC
Inventor: Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay LU , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Greg Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammour Al-Rfou'
IPC: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/62 , G06F3/023 , G06F40/30 , G06F40/232 , G06F40/253 , G06F40/284
CPC classification number: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/6245 , G06F40/284 , G06F40/30 , G06F40/232 , G06F40/253 , G06F3/0237
Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.
-
公开(公告)号:US20230351149A1
公开(公告)日:2023-11-02
申请号:US18141340
申请日:2023-04-28
Applicant: Google LLC
Inventor: Jiahui Yu , Zirui Wang , Vijay Vasudevan , Ho Man Yeung , Seyed Mojtaba Seyedhosseini Tarzjani , Yonghui Wu
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing multi-modal inputs using contrastive captioning neural networks.
-
公开(公告)号:US11755834B2
公开(公告)日:2023-09-12
申请号:US15852916
申请日:2017-12-22
Applicant: Google LLC
Inventor: Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay Lu , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Gregory Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammouf Al-Rfou'
IPC: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/62 , G06F3/023 , G06F40/30 , G06F40/232 , G06F40/253 , G06F40/284
CPC classification number: G06F40/274 , G06F3/0237 , G06F3/04842 , G06F21/6245 , G06F40/232 , G06F40/253 , G06F40/284 , G06F40/30 , G06N20/00
Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.
-
公开(公告)号:US20230252974A1
公开(公告)日:2023-08-10
申请号:US18010438
申请日:2021-09-02
Applicant: Google LLC
Inventor: Byungha Chun , Mohammad Norouzi , Nanxin Chen , Ron J. Weiss , William Chan , Yu Zhang , Yonghui Wu
IPC: G10L13/08 , G10L21/0208
CPC classification number: G10L13/08 , G10L21/0208
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating waveforms conditioned on phoneme sequences. In one aspect, a method comprises: obtaining a phoneme sequence; processing the phoneme sequence using an encoder neural network to generate a hidden representation of the phoneme sequence; generating, from the hidden representation, a conditioning input; initializing a current waveform output; and generating a final waveform output that defines an utterance of the phoneme sequence by a speaker by updating the current waveform output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing (i) the current waveform output and (ii) the conditioning input using a noise estimation neural network to generate a noise output; and updating the current waveform output using the noise output and the noise level for the iteration.
-
公开(公告)号:US20230118303A1
公开(公告)日:2023-04-20
申请号:US18082415
申请日:2022-12-15
Applicant: Google LLC
Inventor: Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.
-
公开(公告)号:US20220130374A1
公开(公告)日:2022-04-28
申请号:US17572238
申请日:2022-01-10
Applicant: Google LLC
Inventor: Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
-
-
-
-
-
-
-
-
-