Patent search ap:("GOOGLE LLC") AND inv:"Rohit Prabhavalkar" Page 1

1.

发明授权
Cascaded encoders for simplified streaming and non-streaming ASR 有权

公开(公告)号：US12154581B2

公开(公告)日：2024-11-26

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G06N3/08 , G10L15/00 , G10L15/16 , G10L15/32 , G10L25/30

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

2.

发明授权
Learning word-level confidence for subword end-to-end automatic speech recognition 有权

公开(公告)号：US11610586B2

公开(公告)日：2023-03-21

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G06N3/08 , G10L25/30

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

3.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20220310072A1

公开(公告)日：2022-09-29

申请号：US17616129

申请日：2020-06-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Ruoming Pang , David Rybach , Yanzhang He , Rohit Prabhavalkar , Wei Li , Mirkó Visontai , Qiao Liang , Trevor Strohman , Yonghui Wu , Ian C. McGraw , Chung-Cheng Chiu

IPC: G10L15/16 , G10L15/32 , G10L15/05

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

4.

发明公开
IDENTIFYING AND CORRECTING AUTOMATIC SPEECH RECOGNITION (ASR) MISRECOGNITIONS IN A DECENTRALIZED MANNER 审中-公开

公开(公告)号：US20240112673A1

公开(公告)日：2024-04-04

申请号：US17958887

申请日：2022-10-03

Applicant: GOOGLE LLC

Inventor： Rajiv Mathews , Rohit Prabhavalkar , Giovanni Motta , Mingqing Chen , Lillian Zhou , Dhruv Guliani , Harry Zhang , Trevor Strohman , Françoise Beaufays

IPC: G10L15/197 , G10L15/06 , G10L15/22 , G10L15/30

CPC classification number: G10L15/197 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0635

Abstract: Implementations described herein identify and correct automatic speech recognition (ASR) misrecognitions. For example, on-device processor(s) of a client device may generate a predicted textual segment that is predicted to correspond to spoken utterance of a user of the client device, and may receive further input that modifies the predicted textual segment to an alternate textual segment. Further, the on-device processor(s) may store these textual segments in on-device storage as a candidate correction pair, and transmit the candidate correction pair to a remote system. Moreover, remote processor(s) of the remote system may determine that the candidate correction pair is an actual correction pair, and may cause client devices to generate updates for a global ASR model for the candidate correction pair. Additionally, the remote processor(s) may distribute the global ASR model to the client devices and/or additional client devices.

5.

发明公开
Deliberation by Text-Only and Semi-Supervised Training 审中-公开

公开(公告)号：US20230298563A1

公开(公告)日：2023-09-21

申请号：US18186157

申请日：2023-03-18

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Yanzhang He , Rohit Prabhavalkar , Sepand Mavandadi , Weiran Wang , Trevor Strohman

IPC: G10L13/08 , G10L15/16 , G10L15/06

CPC classification number: G10L13/08 , G10L15/16 , G10L15/063

Abstract: A method of text-only and semi-supervised training for deliberation includes receiving training data including unspoken textual utterances that are each not paired with any corresponding spoken utterance of non-synthetic speech, and training a deliberation model that includes a text encoder and a deliberation decoder on the unspoken textual utterances. The method also includes receiving, at the trained deliberation model, first-pass hypotheses and non-causal acoustic embeddings. The first-pass hypotheses is generated by a recurrent neural network-transducer (RNN-T) decoder for the non-causal acoustic embeddings encoded by a non-causal encoder. The method also includes encoding, using the text encoder, the first-pass hypotheses generated by the RNN-T decoder, and generating, using the deliberation decoder attending to both the first-pass hypotheses and the non-causal acoustic embeddings, second-pass hypotheses.

6.

发明申请
Learning Word-Level Confidence for Subword End-To-End Automatic Speech Recognition 有权

公开(公告)号：US20220270597A1

公开(公告)日：2022-08-25

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G10L25/30 , G06N3/08

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

7.

发明申请
Cascaded Encoders for Simplified Streaming and Non-Streaming ASR 有权

公开(公告)号：US20220122622A1

公开(公告)日：2022-04-21

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G10L15/00 , G10L25/30 , G06N3/08

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification