Patent search ap:("Google LLC") AND inv:"Ronny Huang" Page 1

1.

发明授权
Large-scale language model data selection for rare-word speech recognition 有权

公开(公告)号：US12014725B2

公开(公告)日：2024-06-18

申请号：US17643861

申请日：2021-12-13

Applicant: Google LLC

Inventor： Ronny Huang , Tara N. Sainath

IPC: G10L15/16 , G06N3/02 , G10L15/06 , G10L15/197 , G10L15/22

CPC classification number: G10L15/063 , G06N3/02 , G10L15/16 , G10L15/197 , G10L15/22

Abstract: A method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. Each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. The method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. The method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.

2.

发明公开
Joint Segmenting and Automatic Speech Recognition 审中-公开

公开(公告)号：US20230343332A1

公开(公告)日：2023-10-26

申请号：US18304064

申请日：2023-04-20

Applicant: Google LLC

Inventor： Ronny Huang , Shuo-yiin Chang , David Rybach , Rohit Prakash Prabhavalkar , Tara N. Sainath , Cyril Allauzen , Charles Caleb Peyser , Zhiyun Lu

IPC: G10L15/04 , G10L25/93 , G10L15/197 , G10L15/06 , G10L15/22 , G10L15/02

CPC classification number: G10L15/197 , G10L15/02 , G10L15/04 , G10L15/063 , G10L15/22 , G10L25/93 , G10L2015/025 , G10L2025/932

Abstract: A joint segmenting and ASR model includes an encoder and decoder. The encoder configured to: receive a sequence of acoustic frames characterizing one or more utterances; and generate, at each output step, a higher order feature representation for a corresponding acoustic frame. The decoder configured to: receive the higher order feature representation and generate, at each output step: a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of speech segment. The j oint segmenting and ASR model trained on a set of training samples, each training sample including: audio data characterizing a spoken utterance; and a corresponding transcription of the spoken utterance, the corresponding transcription having an end of speech segment ground truth token inserted into the corresponding transcription automatically based on a set of heuristic-based rules and exceptions applied to the training sample.

3.

发明申请
Lookup-Table Recurrent Language Model 有权

公开(公告)号：US20220310067A1

公开(公告)日：2022-09-29

申请号：US17650566

申请日：2022-02-10

Applicant: Google LLC

Inventor： Ronny Huang , Tara N. Sainath , Trevor Strohman , Shankar Kumar

IPC: G10L15/08 , G10L15/26 , G10L15/187 , G06N3/04 , G10L15/16

Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.

4.

发明公开
Detecting Unintended Memorization in Language-Model-Fused ASR Systems 审中-公开

公开(公告)号：US20230335126A1

公开(公告)日：2023-10-19

申请号：US18303296

申请日：2023-04-19

Applicant: Google LLC

Inventor： Ronny Huang , Steve Chien , Om Thakkar , Rajiv Mathews

IPC: G10L15/197 , G10L13/02 , G10L15/01 , G10L15/06 , G10L15/16

CPC classification number: G10L15/197 , G10L13/02 , G10L15/01 , G10L15/063 , G10L15/16

Abstract: A method includes inserting a set of canary text samples into a corpus of training text samples and training an external language model on the corpus of training text samples and the set of canary text samples inserted into the corpus of training text samples. For each canary text sample, the method also includes generating a corresponding synthetic speech utterance and generating an initial transcription for the corresponding synthetic speech utterance. The method also includes rescoring the initial transcription generated for each corresponding synthetic speech utterance using the external language model. The method also includes determining a word error rate (WER) of the external language model based on the rescored initial transcriptions and the canary text samples and detecting memorization of the canary text samples by the external language model based on the WER of the external language model.

5.

发明公开
Rare Word Recognition with LM-aware MWER Training 审中-公开

公开(公告)号：US20230298570A1

公开(公告)日：2023-09-21

申请号：US18187222

申请日：2023-03-21

Applicant: Google LLC

Inventor： Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach

IPC: G10L15/06 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

CPC classification number: G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification