-
公开(公告)号:US20240428786A1
公开(公告)日:2024-12-26
申请号:US18826655
申请日:2024-09-06
Applicant: Google LLC
Inventor: Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman
IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22
Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.
-
公开(公告)号:US12175963B2
公开(公告)日:2024-12-24
申请号:US18525475
申请日:2023-11-30
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:US12112198B2
公开(公告)日:2024-10-08
申请号:US18082415
申请日:2022-12-15
Applicant: Google LLC
Inventor: Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu
CPC classification number: G06F9/4881 , G06N3/063 , G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.
-
公开(公告)号:US20240312449A1
公开(公告)日:2024-09-19
申请号:US18676743
申请日:2024-05-29
Applicant: Google LLC
Inventor: Ruoming Pang , Andros Tjandra , Yu Zhang , Shigeki Karita
IPC: G10L13/027 , G10L21/0308
CPC classification number: G10L13/027 , G10L21/0308
Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.
-
公开(公告)号:US12027151B2
公开(公告)日:2024-07-02
申请号:US17455667
申请日:2021-11-18
Applicant: Google LLC
Inventor: Ruoming Pang , Andros Tjandra , Yu Zhang , Shigeki Karita
IPC: G10L13/027 , G10L21/0308
CPC classification number: G10L13/027 , G10L21/0308
Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.
-
公开(公告)号:US11928574B2
公开(公告)日:2024-03-12
申请号:US18154321
申请日:2023-01-13
Applicant: Google LLC
Inventor: Mingxing Tan , Quoc Le , Bo Chen , Vijay Vasudevan , Ruoming Pang
Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.
-
公开(公告)号:US20220357985A1
公开(公告)日:2022-11-10
申请号:US17738909
申请日:2022-05-06
Applicant: Google LLC
Inventor: Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.
-
公开(公告)号:US20220351713A1
公开(公告)日:2022-11-03
申请号:US17813361
申请日:2022-07-19
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:US11488575B2
公开(公告)日:2022-11-01
申请号:US17055951
申请日:2019-05-17
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:US20220343894A1
公开(公告)日:2022-10-27
申请号:US17348118
申请日:2021-06-15
Applicant: Google LLC
Inventor: Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao
Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.
-
-
-
-
-
-
-
-
-