Patent search ap:("Google LLC") AND inv:"Anmol Gulati" Page 1

1.

发明申请
Optimizing Inference Performance for Conformer 有权

公开(公告)号：US20230130634A1

公开(公告)日：2023-04-27

申请号：US17936547

申请日：2022-09-29

Applicant: Google LLC

Inventor： Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu

IPC: G10L15/16 , G10L15/22 , G10L15/06

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

2.

发明申请
Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization 有权

公开(公告)号：US20220122586A1

公开(公告)日：2022-04-21

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/22 , G10L15/30 , G10L15/16

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

3.

发明授权
Optimizing inference performance for conformer 有权

公开(公告)号：US12190869B2

公开(公告)日：2025-01-07

申请号：US17936547

申请日：2022-09-29

Applicant: Google LLC

Inventor： Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu

IPC: G10L15/16 , G10L15/06 , G10L15/22

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

4.

发明授权
Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice 有权

公开(公告)号：US12094453B2

公开(公告)日：2024-09-17

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/16 , G10L15/187 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L15/187

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

5.

发明公开
Systems and Methods for Training Dual-Mode Machine-Learned Speech Recognition Models 审中-公开

公开(公告)号：US20230237993A1

公开(公告)日：2023-07-27

申请号：US18011571

申请日：2021-10-01

Applicant: Google LLC

Inventor： Jiahui Yu , Ruoming Pang , Wei Han , Anmol Gulati , Chung-Cheng Chiu , Bo Li , Tara N. Sainath , Yonghui Hu

IPC: G10L15/16 , G10L15/32 , G10L15/22

CPC classification number: G10L15/16 , G10L15/32 , G10L15/22

Abstract: Systems and methods of the present disclosure are directed to a computing system, including one or more processors and a machine-learned multi-mode speech recognition model configured to operate in a streaming recognition mode or a contextual recognition mode. The computing system can perform operations including obtaining speech data and a ground truth label and processing the speech data using the contextual recognition mode to obtain contextual prediction data. The operations can include evaluating a difference between the contextual prediction data and the ground truth label and processing the speech data using the streaming recognition mode to obtain streaming prediction data. The operations can include evaluating a difference between the streaming prediction data and the ground truth label and the contextual and streaming prediction data. The operations can include adjusting parameters of the speech recognition model.

6.

发明申请
Convolution-Augmented Transformer Models 有权

公开(公告)号：US20220207321A1

公开(公告)日：2022-06-30

申请号：US17139525

申请日：2020-12-31

Applicant: Google LLC

Inventor： Anmol Gulati , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang , Weikeng Qin , Zhengdong Zhang

IPC: G06N3/04 , G10L15/16 , G06N20/00

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

7.

发明公开
Convolution-Augmented Transformer Models 审中-公开

公开(公告)号：US20240362453A1

公开(公告)日：2024-10-31

申请号：US18766038

申请日：2024-07-08

Applicant: Google LLC

Inventor： Anmol Gulati , Weikeng Qin , Zhengdong Zhang , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang

IPC: G06N3/04 , G06N20/00 , G10L15/16

CPC classification number: G06N3/04 , G06N20/00 , G10L15/16

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

8.

发明授权
Convolution-augmented transformer models 有权

公开(公告)号：US12079703B2

公开(公告)日：2024-09-03

申请号：US17139525

申请日：2020-12-31

Applicant: Google LLC

Inventor： Anmol Gulati , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang , Weikeng Qin , Zhengdong Zhang

IPC: G06N3/04 , G06N20/00 , G10L15/16

CPC classification number: G06N3/04 , G06N20/00 , G10L15/16

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification