Patent search ap:("Google LLC") AND inv:"Ruoming Pang" Page 6

51.

发明申请
SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS 有权

公开(公告)号：US20250095630A1

公开(公告)日：2025-03-20

申请号：US18966088

申请日：2024-12-02

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G06N3/08 , G10L13/02 , G10L17/04 , G10L19/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

52.

发明授权
Language agnostic multilingual end-to-end streaming on-device ASR system 有权

公开(公告)号：US12183322B2

公开(公告)日：2024-12-31

申请号：US17934555

申请日：2022-09-22

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath , Ruoming Pang , Shuo-yiin Chang , Qiumin Xu , Trevor Strohman , Vince Chen , Qiao Liang , Heguang Liu , Yanzhang He , Parisa Haghani , Sameer Bidichandani

IPC: G10L15/00 , G10L15/06 , G10L15/22 , G10L15/30

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

53.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20240420687A1

公开(公告)日：2024-12-19

申请号：US18815537

申请日：2024-08-26

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/22

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

54.

发明授权
Cascaded encoders for simplified streaming and non-streaming ASR 有权

公开(公告)号：US12154581B2

公开(公告)日：2024-11-26

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G06N3/08 , G10L15/00 , G10L15/16 , G10L15/32 , G10L25/30

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

55.

发明公开
Contrastive Learning and Masked Modeling for End-To-End Self-Supervised Pre-Training 审中-公开

公开(公告)号：US20240104352A1

公开(公告)日：2024-03-28

申请号：US18012391

申请日：2022-07-28

Applicant: Google LLC

Inventor： Yu Zhang , Yu-An Chung , Wei Han , Chung-Cheng Chiu , Weikeng Qin , Ruoming Pang , Yonghui Wu

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: Provided are improved end-to-end self-supervised pre-training frameworks that leverage a combination of contrastive and masked modeling loss terms. In particular, the present disclosure provides framework that combines contrastive learning and masked modeling, where the former trains the model to discretize input data (e.g., continuous signals such as continuous speech signals) into a finite set of discriminative tokens, and the latter trains the model to learn contextualized representations via solving a masked prediction task consuming the discretized tokens. In contrast to certain existing masked modeling-based pre-training frameworks which rely on an iterative re-clustering and re-training process or other existing frameworks which concatenate two separately trained modules, the proposed framework can enable a model to be optimized in an end-to-end fashion by solving the two self-supervised tasks (the contrastive task and masked modeling) simultaneously.

56.

发明授权
Deliberation model-based two-pass end-to-end speech recognition 有权

公开(公告)号：US11908461B2

公开(公告)日：2024-02-20

申请号：US17149018

申请日：2021-01-14

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar

IPC: G10L15/18 , G06N3/049 , G10L15/06 , G10L15/16 , G10L15/187 , G10L19/00

CPC classification number: G10L15/1815 , G06N3/049 , G10L15/063 , G10L15/16 , G10L15/187 , G10L19/0018

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.

57.

发明授权
Synthesis of speech from text in a voice of a target speaker using neural networks 有权

公开(公告)号：US11848002B2

公开(公告)日：2023-12-19

申请号：US17813361

申请日：2022-07-19

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L13/02

CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L2013/021

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

58.

发明授权
Streaming automatic speech recognition with non-streaming model distillation 有权

公开(公告)号：US11804212B2

公开(公告)日：2023-10-31

申请号：US17348118

申请日：2021-06-15

Applicant: Google LLC

Inventor： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC: G10L15/06 , G10L15/08 , G10L15/18 , G06N3/04 , G06N3/045

CPC classification number: G10L15/063 , G06N3/045 , G10L15/083 , G10L15/18

Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

59.

发明公开
Neural Architecture Search with Factorized Hierarchical Search Space 审中-公开

公开(公告)号：US20230244904A1

公开(公告)日：2023-08-03

申请号：US18154321

申请日：2023-01-13

Applicant: Google LLC

Inventor： Mingxing Tan , Quoc Le , Bo Chen , Vijay Vasudevan , Ruoming Pang

IPC: G06N3/04 , G06N20/10 , G06F17/15 , G06N3/084 , G06N3/044

CPC classification number: G06N3/04 , G06N20/10 , G06F17/15 , G06N3/084 , G06N3/044 , G06N3/045

Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.

60.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US11715458B2

公开(公告)日：2023-08-01

申请号：US17316198

申请日：2021-05-10

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/00 , G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification