Patent search ap:("Google LLC") AND inv:"Chung-Cheng Chiu" Page 1

1.

发明公开
Streaming Automatic Speech Recognition With Non-Streaming Model Distillation 审中-公开

公开(公告)号：US20240029716A1

公开(公告)日：2024-01-25

申请号：US18480827

申请日：2023-10-04

Applicant: Google LLC

Inventor： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC: G10L15/06 , G10L15/08 , G10L15/18 , G06N3/045

CPC classification number: G10L15/063 , G10L15/083 , G10L15/18 , G06N3/045

Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

2.

发明公开
MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS 审中-公开

公开(公告)号：US20230237995A1

公开(公告)日：2023-07-27

申请号：US18194586

申请日：2023-03-31

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Tara N. Sainath , Younghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Kannan

IPC: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22

CPC classification number: G10L15/197 , G10L15/16 , G10L15/063 , G10L15/02 , G10L15/22 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

3.

发明公开
Systems and Methods for Training Dual-Mode Machine-Learned Speech Recognition Models 审中-公开

公开(公告)号：US20230237993A1

公开(公告)日：2023-07-27

申请号：US18011571

申请日：2021-10-01

Applicant: Google LLC

Inventor： Jiahui Yu , Ruoming Pang , Wei Han , Anmol Gulati , Chung-Cheng Chiu , Bo Li , Tara N. Sainath , Yonghui Hu

IPC: G10L15/16 , G10L15/32 , G10L15/22

CPC classification number: G10L15/16 , G10L15/32 , G10L15/22

Abstract: Systems and methods of the present disclosure are directed to a computing system, including one or more processors and a machine-learned multi-mode speech recognition model configured to operate in a streaming recognition mode or a contextual recognition mode. The computing system can perform operations including obtaining speech data and a ground truth label and processing the speech data using the contextual recognition mode to obtain contextual prediction data. The operations can include evaluating a difference between the contextual prediction data and the ground truth label and processing the speech data using the streaming recognition mode to obtain streaming prediction data. The operations can include evaluating a difference between the streaming prediction data and the ground truth label and the contextual and streaming prediction data. The operations can include adjusting parameters of the speech recognition model.

4.

发明申请
Convolution-Augmented Transformer Models 有权

公开(公告)号：US20220207321A1

公开(公告)日：2022-06-30

申请号：US17139525

申请日：2020-12-31

Applicant: Google LLC

Inventor： Anmol Gulati , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang , Weikeng Qin , Zhengdong Zhang

IPC: G06N3/04 , G10L15/16 , G06N20/00

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

5.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US11335333B2

公开(公告)日：2022-05-17

申请号：US16717746

申请日：2019-12-17

Applicant: Google LLC

Inventor： Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Patrick Nguyen , Sergey Kishchenko

IPC: G10L15/00 , G10L15/16 , G10L15/04 , G10L15/06 , G10L15/22 , G10L15/187 , G10L15/26 , G10L15/02

Abstract: A method includes obtaining audio data for a long-form utterance and segmenting the audio data for the long-form utterance into a plurality of overlapping segments. The method also includes, for each overlapping segment of the plurality of overlapping segments: providing features indicative of acoustic characteristics of the long-form utterance represented by the corresponding overlapping segment as input to an encoder neural network; processing an output of the encoder neural network using an attender neural network to generate a context vector; and generating word elements using the context vector and a decoder neural network. The method also includes generating a transcription for the long-form utterance by merging the word elements from the plurality of overlapping segments and providing the transcription as an output of the automated speech recognition system.

6.

发明授权
Enhanced attention mechanisms 有权

公开(公告)号：US11210475B2

公开(公告)日：2021-12-28

申请号：US16518518

申请日：2019-07-22

Applicant: Google LLC

Inventor： Chung-Cheng Chiu , Colin Abraham Raffel

IPC: G06F17/00 , G06F40/40 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for enhanced attention mechanisms. In some implementations, data indicating an input sequence is received. The data is processed using an encoder neural network to generate a sequence of encodings. A series of attention outputs is determined using one or more attender modules. Determining each attention output can include (i) selecting an encoding from the sequence of encodings and (ii) determining attention over a proper subset of the sequence of encodings, where the proper subset of encodings is determined based on a position of the selected encoding in the sequence of encodings. The selections of encodings are also monotonic through the sequence of encodings. An output sequence is generated by processing the attention outputs using a decoder neural network. An output is provided that indicates a language sequence determined from the output sequence.

7.

发明申请
Augmentation of Audiographic Images for Improved Machine Learning 审中-公开

公开(公告)号：US20190354808A1

公开(公告)日：2019-11-21

申请号：US16416888

申请日：2019-05-20

Applicant: Google LLC

Inventor： Daniel Sung-Joon Park , Quoc Le , William Chan , Ekin Dogus Cubuk , Barret Zoph , Yu Zhang , Chung-Cheng Chiu

IPC: G06K9/62 , G10L15/16 , G06N20/00 , G10L15/06 , G10L15/12 , G10L15/28

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

8.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20240420686A1

公开(公告)日：2024-12-19

申请号：US18815200

申请日：2024-08-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

9.

发明公开
Augmentation of Audiographic Images for Improved Machine Learning 审中-公开

公开(公告)号：US20230359898A1

公开(公告)日：2023-11-09

申请号：US18350464

申请日：2023-07-11

Applicant: Google LLC

Inventor： Daniel Sung-Joon Park , Quoc Le , William Chan , Ekin Dogus Cubuk , Barret Zoph , Yu Zhang , Chung-Cheng Chiu

IPC: G06V10/774 , G06N20/00 , G10L15/16 , G10L15/06 , G10L15/12 , G10L15/28 , G06V10/82

CPC classification number: G06N3/084 , G06N20/00 , G10L15/16 , G10L15/063 , G10L15/12 , G06V10/7747 , G10L15/28 , G06V10/82 , G06F18/2148

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

10.

发明公开
Attention-Based Joint Acoustic and Text On-Device End-to-End Model 审中-公开

公开(公告)号：US20230186901A1

公开(公告)日：2023-06-15

申请号：US18167454

申请日：2023-02-10

Applicant: Google LLC

Inventor： Tara N. Sainath , Ruoming Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman

IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/197

CPC classification number: G10L15/063 , G06N3/08 , G10L15/16 , G10L15/197 , G10L2015/0635

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification