Patent search ap:("Google LLC") AND inv:"Kartik Audhkhasi" Page 1

1.

发明申请
Zero-Shot Task Expansion of ASR Models Using Task Vectors 有权

公开(公告)号：US20250078813A1

公开(公告)日：2025-03-06

申请号：US18817181

申请日：2024-08-27

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Gowtham Ramesh , Bhuvana Ramabhadran

IPC: G10L15/06

Abstract: A method includes training, using an un-supervised learning technique, an auxiliary ASR model based on a first set of un-transcribed source task speech utterances to determine a first task vector, training, using the un-supervised learning technique, the auxiliary ASR model based on a second set of un-transcribed speech utterances to determine a second task vector, and training, using the un-supervised learning technique, the auxiliary ASR model based on un-transcribed target task speech utterances to determine a target task vector. The method also includes determining a first correlation between the first and target task vectors, determining a second correlation between the second and target task vectors, and adapting parameters of a trained primary ASR model based on the first and second source task vectors and the first and second correlations to teach the primary ASR model to learn how to recognize speech associated with the target task.

2.

发明授权
Regularizing word segmentation 有权

公开(公告)号：US12087279B2

公开(公告)日：2024-09-10

申请号：US17656225

申请日：2022-03-23

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang

IPC: G10L15/02 , G06F40/284 , G06N3/04 , G10L15/04 , G10L15/06 , G10L15/16 , G10L25/30

CPC classification number: G10L15/04 , G06F40/284 , G06N3/04 , G10L15/063 , G10L15/16 , G10L25/30 , G10L15/02

Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.

3.

发明申请
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition 有权

公开(公告)号：US20220310073A1

公开(公告)日：2022-09-29

申请号：US17644343

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

4.

发明授权
Mixture model attention for flexible streaming and non-streaming automatic speech recognition 有权

公开(公告)号：US12136415B2

公开(公告)日：2024-11-05

申请号：US17644343

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/16

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

5.

发明授权
Mixture model attention for flexible streaming and non-streaming automatic speech recognition 有权

公开(公告)号：US12014729B2

公开(公告)日：2024-06-18

申请号：US17644344

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/16

CPC classification number: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/167

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

6.

发明申请
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition 有权

公开(公告)号：US20250022458A1

公开(公告)日：2025-01-16

申请号：US18896830

申请日：2024-09-25

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/16

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

7.

发明申请
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition 有权

公开(公告)号：US20220310074A1

公开(公告)日：2022-09-29

申请号：US17644344

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G10L19/16 , G06N3/04 , G06F1/03

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

8.

发明申请
Regularizing Word Segmentation 有权

公开(公告)号：US20220310061A1

公开(公告)日：2022-09-29

申请号：US17656225

申请日：2022-03-23

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang

IPC: G10L15/04 , G10L25/30 , G06N3/04

Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification