Patent search ap:("Google LLC") AND inv:"Chung-Cheng Chiu" Page 4

31.

发明授权
Recurrent neural networks for online sequence generation 有权

公开(公告)号：US10656605B1

公开(公告)日：2020-05-19

申请号：US16401791

申请日：2019-05-02

Applicant: Google LLC

Inventor： Chung-Cheng Chiu , Navdeep Jaitly , Ilya Sutskever , Yuping Luo

IPC: G05B13/02 , G10L15/16 , G06N3/04 , G06F40/44

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive am input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.

32.

发明授权
Cascaded encoders for simplified streaming and non-streaming ASR 有权

公开(公告)号：US12154581B2

公开(公告)日：2024-11-26

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G06N3/08 , G10L15/00 , G10L15/16 , G10L15/32 , G10L25/30

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

33.

发明授权
Joint acoustic echo cancelation, speech enhancement, and voice separation for automatic speech recognition 有权

公开(公告)号：US12119014B2

公开(公告)日：2024-10-15

申请号：US17644108

申请日：2021-12-14

Applicant: Google LLC

Inventor： Arun Narayanan , Tom O'malley , Quan Wang , Alex Park , James Walker , Nathan David Howard , Yanzhang He , Chung-Cheng Chiu

IPC: G10L21/0216 , G06N3/04 , G10L15/06 , G10L21/0208 , H04R3/04

CPC classification number: G10L21/0216 , G06N3/04 , G10L15/063 , H04R3/04 , G10L2021/02082

Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

34.

发明公开
Contrastive Learning and Masked Modeling for End-To-End Self-Supervised Pre-Training 审中-公开

公开(公告)号：US20240104352A1

公开(公告)日：2024-03-28

申请号：US18012391

申请日：2022-07-28

Applicant: Google LLC

Inventor： Yu Zhang , Yu-An Chung , Wei Han , Chung-Cheng Chiu , Weikeng Qin , Ruoming Pang , Yonghui Wu

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: Provided are improved end-to-end self-supervised pre-training frameworks that leverage a combination of contrastive and masked modeling loss terms. In particular, the present disclosure provides framework that combines contrastive learning and masked modeling, where the former trains the model to discretize input data (e.g., continuous signals such as continuous speech signals) into a finite set of discriminative tokens, and the latter trains the model to learn contextualized representations via solving a masked prediction task consuming the discretized tokens. In contrast to certain existing masked modeling-based pre-training frameworks which rely on an iterative re-clustering and re-training process or other existing frameworks which concatenate two separately trained modules, the proposed framework can enable a model to be optimized in an end-to-end fashion by solving the two self-supervised tasks (the contrastive task and masked modeling) simultaneously.

35.

发明授权
Augmentation of audiographic images for improved machine learning 有权

公开(公告)号：US11816577B2

公开(公告)日：2023-11-14

申请号：US17487548

申请日：2021-09-28

Applicant: Google LLC

Inventor： Daniel Sung-Joon Park , Quoc Le , William Chan , Ekin Dogus Cubuk , Barret Zoph , Yu Zhang , Chung-Cheng Chiu

IPC: G10L15/06 , G10L15/12 , G06N3/084 , G10L15/16 , G10L15/28 , G06N20/00 , G06F18/214 , G06V10/774 , G06V10/82

CPC classification number: G06N3/084 , G06F18/2148 , G06N20/00 , G06V10/7747 , G06V10/82 , G10L15/063 , G10L15/12 , G10L15/16 , G10L15/28

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

36.

发明授权
Streaming automatic speech recognition with non-streaming model distillation 有权

公开(公告)号：US11804212B2

公开(公告)日：2023-10-31

申请号：US17348118

申请日：2021-06-15

Applicant: Google LLC

Inventor： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC: G10L15/06 , G10L15/08 , G10L15/18 , G06N3/04 , G06N3/045

CPC classification number: G10L15/063 , G06N3/045 , G10L15/083 , G10L15/18

Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

37.

发明授权
Recurrent neural networks for online sequence generation 有权

公开(公告)号：US11625572B2

公开(公告)日：2023-04-11

申请号：US16610466

申请日：2018-05-03

Applicant: GOOGLE LLC

Inventor： Chung-Cheng Chiu , Navdeep Jaitly , John Dieterich Lawson , George Jay Tucker

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive an input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.

38.

发明授权
Attention-based joint acoustic and text on-device end-to-end model 有权

公开(公告)号：US11594212B2

公开(公告)日：2023-02-28

申请号：US17155010

申请日：2021-01-21

Applicant: Google LLC

Inventor： Tara N. Sainath , Ruoming Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman

IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/197

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

39.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US11145293B2

公开(公告)日：2021-10-12

申请号：US16516390

申请日：2019-07-19

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/22 , G10L15/02 , G06N3/08 , G10L15/06 , G10L25/30 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

40.

发明授权
Augmentation of audiographic images for improved machine learning 有权

公开(公告)号：US11138471B2

公开(公告)日：2021-10-05

申请号：US16416888

申请日：2019-05-20

Applicant: Google LLC

Inventor： Daniel Sung-Joon Park , Quoc Le , William Chan , Ekin Dogus Cubuk , Barret Zoph , Yu Zhang , Chung-Cheng Chiu

IPC: G06N20/00 , G10L15/06 , G10L15/12 , G06K9/62 , G10L15/16 , G10L15/28

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification