Patent search ap:("Google LLC") AND inv:"Golan Pundak" Page 2

11.

发明公开
ENHANCING AUDIO USING MULTIPLE RECORDING DEVICES 审中-公开

公开(公告)号：US20240203456A1

公开(公告)日：2024-06-20

申请号：US18590607

申请日：2024-02-28

Applicant: Google LLC

Inventor： Dimitri Kanevsky , Golan Pundak

IPC: G11B20/10 , G06F3/16 , G10L17/00 , G10L21/0208 , G10L21/028 , G10L21/0364 , G10L25/51 , G10L25/84 , H04M3/56

CPC classification number: G11B20/10527 , G06F3/16 , G06F3/165 , G10L17/00 , G10L21/0364 , G10L25/51 , H04M3/56 , H04M3/568 , G10L21/0208 , G10L21/028 , G10L25/84 , G11B2020/10546

Abstract: Various arrangements for enhancing audio are detailed herein. An audio stream and a second audio stream can be received. From these audio streams, a first audio source and a second audio source are extracted. A conversation between the first audio source and a third audio source that occurs within the audio streams is identified. An updated audio stream is generated that enhances the first audio source and diminishes the second audio source extracted from the audio stream and the second audio stream.

12.

发明公开
PROPER NOUN RECOGNITION IN END-TO-END SPEECH RECOGNITION 审中-公开

公开(公告)号：US20230377564A1

公开(公告)日：2023-11-23

申请号：US18362273

申请日：2023-07-31

Applicant: Google LLC

Inventor： Charles Caleb Peyser , Tara N. Sainath , Golan Pundak

IPC: G10L15/06 , G06N3/049 , G10L15/16 , G10L15/18 , G10L15/187

CPC classification number: G10L15/063 , G06N3/049 , G10L15/16 , G10L15/1815 , G10L15/187

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

13.

发明公开
Contextual Biasing for Speech Recognition 审中-公开

公开(公告)号：US20230274736A1

公开(公告)日：2023-08-31

申请号：US18311964

申请日：2023-05-04

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath , Antoine Jean Bruguier

IPC: G10L15/187 , G06N20/10 , G10L19/04

CPC classification number: G10L15/187 , G06N20/10 , G10L19/04 , G10L2015/088

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

14.

发明授权
Contextual biasing for speech recognition 有权

公开(公告)号：US11664021B2

公开(公告)日：2023-05-30

申请号：US17643423

申请日：2021-12-09

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath , Antoine Jean Bruguier

IPC: G10L15/187 , G06N20/10 , G10L19/04 , G10L15/08

CPC classification number: G10L15/187 , G06N20/10 , G10L19/04 , G10L2015/088

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

15.

发明授权
Enhancing audio using multiple recording devices 有权

公开(公告)号：US11443769B2

公开(公告)日：2022-09-13

申请号：US17194827

申请日：2021-03-08

Applicant: Google LLC

Inventor： Dimitri Kanevsky , Golan Pundak

IPC: G11B20/10 , G06F3/16 , G10L17/00 , G10L21/0364 , H04M3/56 , G10L25/51 , G10L21/0208 , G10L21/028 , G10L25/84

Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

16.

发明公开
ENABLING LARGE LANGUAGE MODEL-BASED SPOKEN LANGUAGE UNDERSTANDING (SLU) SYSTEMS TO LEVERAGE BOTH AUDIO DATA AND TEXTUAL DATA IN PROCESSING SPOKEN UTTERANCES 审中-公开

公开(公告)号：US20240203404A1

公开(公告)日：2024-06-20

申请号：US18081569

申请日：2022-12-14

Applicant: GOOGLE LLC

Inventor： Nir Shabat , Volodymyr Polosukhin , Shlomo Fruchter , Golan Pundak , Roy Atsmon

IPC: G10L15/18 , G10L13/027 , G10L15/26

CPC classification number: G10L15/1815 , G10L13/027 , G10L15/26

Abstract: In various implementations, a method implemented by one or more processors of a computing device can comprise receiving audio data that captures a spoken utterance of a user; processing the audio data using an automatic speech recognition (ASR) model to generate textual data corresponding to the spoken utterance; generating a semantic representation corresponding to the spoken utterance of the user based on applying both the audio data and the textual data as input across a large language model (LLM); and causing the semantic representation corresponding to the spoken utterance of the user to be utilized in fulfilling the spoken utterance.

17.

发明授权
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models 有权

公开(公告)号：US11942076B2

公开(公告)日：2024-03-26

申请号：US17651315

申请日：2022-02-16

Applicant: Google LLC

Inventor： Ke Hu , Golan Pundak , Rohit Prakash Prabhavalkar , Antoine Jean Bruguier , Tara N. Sainath

IPC: G10L15/30 , G10L15/02 , G10L15/06 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/285 , G10L15/32 , G10L25/30 , G10L2015/025

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

18.

发明申请
CONTEXTUAL BIASING FOR SPEECH RECOGNITION 有权

公开(公告)号：US20220366897A1

公开(公告)日：2022-11-17

申请号：US17815049

申请日：2022-07-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath

IPC: G10L15/16 , G10L15/26

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

19.

发明授权
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models 有权

公开(公告)号：US11270687B2

公开(公告)日：2022-03-08

申请号：US16861190

申请日：2020-04-28

Applicant: Google LLC

Inventor： Ke Hu , Antoine Jean Bruguier , Tara N. Sainath , Rohit Prakash Prabhavalkar , Golan Pundak

IPC: G10L15/30 , G10L15/06 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

20.

发明授权
Enhancing audio using multiple recording devices 有权

公开(公告)号：US10943619B2

公开(公告)日：2021-03-09

申请号：US16812760

申请日：2020-03-09

Applicant: Google LLC

Inventor： Dimitri Kanevsky , Golan Pundak

IPC: G11B20/10 , G06F3/16 , G10L17/00 , G10L21/0364 , H04M3/56 , G10L25/51 , G10L21/0208 , G10L21/028 , G10L25/84

Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification