Patent search ap:("Google LLC") AND inv:"Tara Sainath" Page 1

1.

发明申请
Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization 有权

公开(公告)号：US20220122586A1

公开(公告)日：2022-04-21

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/22 , G10L15/30 , G10L15/16

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

2.

发明授权
Cascaded encoders for simplified streaming and non-streaming ASR 有权

公开(公告)号：US12154581B2

公开(公告)日：2024-11-26

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G06N3/08 , G10L15/00 , G10L15/16 , G10L15/32 , G10L25/30

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

3.

发明授权
Tied and reduced RNN-T 有权

公开(公告)号：US11727920B2

公开(公告)日：2023-08-15

申请号：US17330446

申请日：2021-05-26

Applicant: Google LLC

Inventor： Rami Botros , Tara Sainath

IPC: G10L15/16 , G10L15/08

CPC classification number: G10L15/16 , G10L15/083

Abstract: A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.

4.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US11715458B2

公开(公告)日：2023-08-01

申请号：US17316198

申请日：2021-05-10

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/00 , G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

5.

发明授权
Learning word-level confidence for subword end-to-end automatic speech recognition 有权

公开(公告)号：US11610586B2

公开(公告)日：2023-03-21

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G06N3/08 , G10L25/30

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

6.

发明申请
Tied and Reduced RNN-T 有权

公开(公告)号：US20220310071A1

公开(公告)日：2022-09-29

申请号：US17330446

申请日：2021-05-26

Applicant: Google LLC

Inventor： Rami Botros , Tara Sainath

IPC: G10L15/16 , G10L15/08

Abstract: A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.

7.

发明申请
TIED AND REDUCED RNN-T 有权

公开(公告)号：US20240379094A1

公开(公告)日：2024-11-14

申请号：US18779894

申请日：2024-07-22

Applicant: Google LLC

Inventor： Rami Botros , Tara Sainath

IPC: G10L15/16 , G10L15/08

Abstract: A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.

8.

发明授权
Tied and reduced RNN-T 有权

公开(公告)号：US12062363B2

公开(公告)日：2024-08-13

申请号：US18347842

申请日：2023-07-06

Applicant: Google LLC

Inventor： Rami Botros , Tara Sainath

IPC: G10L15/16 , G10L15/08

CPC classification number: G10L15/16 , G10L15/083

Abstract: A recurrent neural network-transducer (RNN-T) model improves speech recognition by processing sequential non-blank symbols at each time step after an initial one. The model's prediction network receives a sequence of symbols from a final Softmax layer and employs a shared embedding matrix to create and map embeddings to each symbol, associating them with unique position vectors. These embeddings are weighted according to their similarity to their matching position vector. Subsequently, a joint network of the RNN-T model uses these weighted embeddings to output a probability distribution for potential speech recognition hypotheses at each time step, enabling more accurate transcriptions of spoken language.

9.

发明公开
EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL 审中-公开

公开(公告)号：US20230343328A1

公开(公告)日：2023-10-26

申请号：US18336211

申请日：2023-06-16

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

10.

发明申请
Learning Word-Level Confidence for Subword End-To-End Automatic Speech Recognition 有权

公开(公告)号：US20220270597A1

公开(公告)日：2022-08-25

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G10L25/30 , G06N3/08

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification