Patent search ap:("Google LLC") AND inv:"Petar Aleksic" Page 1

1.

发明授权
Voice recognition system 有权

公开(公告)号：US11996103B2

公开(公告)日：2024-05-28

申请号：US17811605

申请日：2022-07-11

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/00 , G06F16/632 , G10L15/04 , G10L15/19 , G10L15/197 , G10L15/22 , G10L15/26 , G10L15/08 , G10L15/183

CPC classification number: G10L15/26 , G06F16/632 , G10L15/04 , G10L15/19 , G10L15/197 , G10L2015/085 , G10L15/183 , G10L15/22

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

2.

发明授权
Alphanumeric sequence biasing for automatic speech recognition using a grammar and a speller finite state transducer 有权

公开(公告)号：US11942091B2

公开(公告)日：2024-03-26

申请号：US17251465

申请日：2020-01-17

Applicant: Google LLC

Inventor： Benjamin Haynor , Petar Aleksic

IPC: G10L15/26 , G10L15/16 , G10L15/193 , G10L15/22 , G10L15/30

CPC classification number: G10L15/26 , G10L15/16 , G10L15/193 , G10L15/22 , G10L15/30

Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one or more candidate recognitions of the alphanumeric sequence using the contextual biasing FST, where the FST further comprises a grammar as well as a speller finite state transducer.

3.

发明授权
Speech recognition with selective use of dynamic language models 有权

公开(公告)号：US11810568B2

公开(公告)日：2023-11-07

申请号：US17118232

申请日：2020-12-10

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/26 , G10L15/18 , G10L15/183 , G10L15/07 , G10L15/197 , G10L15/30 , G10L15/22 , G10L15/06 , G10L15/08

CPC classification number: G10L15/26 , G10L15/07 , G10L15/183 , G10L15/1815 , G10L15/197 , G10L15/30 , G10L2015/0635 , G10L2015/088 , G10L2015/228

Abstract: A computer-implemented method for transcribing an utterance includes receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user.

4.

发明授权
Allowing spelling of arbitrary words 有权

公开(公告)号：US11797763B2

公开(公告)日：2023-10-24

申请号：US17443330

申请日：2021-07-24

Applicant: Google LLC

Inventor： Evgeny A. Cherepanov , Gleb Skobeltsyn , Jakob Nicolaus Foerster , Petar Aleksic , Assaf Avner Hurwitz Michaely

IPC: G10L15/22 , G10L15/01 , G10L15/24 , G06F40/232 , G10L15/32 , G10L15/26 , G10L15/197 , G10L15/187 , G06F3/16 , G10L15/19 , G10L15/30 , G10L15/08

CPC classification number: G06F40/232 , G06F3/167 , G10L15/187 , G10L15/19 , G10L15/197 , G10L15/22 , G10L15/26 , G10L15/30 , G10L15/32 , G10L2015/086 , G10L2015/223

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a first voice input from a user device; generating a first recognition output; receiving a user selection of one or more terms in the first recognition output; receiving a second voice input spelling a correction of the user selection; determining a corrected recognition output for the selected portion; and providing a second recognition output that merges the first recognition output and the corrected recognition output.

5.

发明授权
Language model biasing system 有权

公开(公告)号：US11682383B2

公开(公告)日：2023-06-20

申请号：US17337400

申请日：2021-06-02

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/187 , G10L15/07 , G10L15/18 , G10L15/197 , G10L15/30 , G10L15/01

CPC classification number: G10L15/07 , G10L15/187 , G10L15/1815 , G10L15/197 , G10L15/01 , G10L15/30

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

6.

发明申请
CROSS-LINGUAL SPEECH RECOGNITION 有权

公开(公告)号：US20220383862A1

公开(公告)日：2022-12-01

申请号：US17817176

申请日：2022-08-03

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J Moreno Mengibar

IPC: G10L15/187 , G10L15/02 , G10L15/22

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross-lingual speech recognition are disclosed. In one aspect, a method includes the actions of determining a context of a second computing device. The actions further include identifying, by a first computing device, an additional pronunciation for a term of multiple terms. The actions further include including the additional pronunciation for the term in the lexicon. The actions further include receiving audio data of an utterance. The actions further include generating a transcription of the utterance by using the lexicon that includes the multiple terms and the pronunciation for each of the multiple terms and the additional pronunciation for the term. The actions further include after generating the transcription of the utterance, removing the additional pronunciation for the term from the lexicon. The actions further include providing, for output, the transcription.

7.

发明授权
Cross-lingual speech recognition 有权

公开(公告)号：US11437025B2

公开(公告)日：2022-09-06

申请号：US16593564

申请日：2019-10-04

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/187 , G10L15/02 , G10L15/22

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross-lingual speech recognition are disclosed. In one aspect, a method includes the actions of determining a context of a second computing device. The actions further include identifying, by a first computing device, an additional pronunciation for a term of multiple terms. The actions further include including the additional pronunciation for the term in the lexicon. The actions further include receiving audio data of an utterance. The actions further include generating a transcription of the utterance by using the lexicon that includes the multiple terms and the pronunciation for each of the multiple terms and the additional pronunciation for the term. The actions further include after generating the transcription of the utterance, removing the additional pronunciation for the term from the lexicon. The actions further include providing, for output, the transcription.

8.

发明申请
WORD LATTICE AUGMENTATION FOR AUTOMATIC SPEECH RECOGNITION 有权

公开(公告)号：US20220229992A1

公开(公告)日：2022-07-21

申请号：US17589186

申请日：2022-01-31

Applicant: GOOGLE LLC

Inventor： Leonid Velikovich , Petar Aleksic , Pedro Moreno

IPC: G06F40/295 , G06F40/30 , G10L15/06 , G10L15/187 , G10L15/22

Abstract: Speech processing techniques are disclosed that enable determining a text representation of named entities in captured audio data. Various implementations include determining the location of a carrier phrase in a word lattice representation of the captured audio data, where the carrier phrase provides an indication of a named entity. Additional or alternative implementations include matching a candidate named entity with the portion of the word lattice, and augmenting the word lattice with the matched candidate named entity.

9.

发明授权
Speech recognition using two language models 有权

公开(公告)号：US11341972B2

公开(公告)日：2022-05-24

申请号：US17078030

申请日：2020-10-22

Applicant: Google LLC

Inventor： Alexander H. Gruenstein , Petar Aleksic

IPC: G10L15/26 , G10L15/30 , G10L15/18 , G10L15/22 , G10L15/32 , G10L15/193 , G10L15/197

Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

10.

发明授权
Contextual denormalization for automatic speech recognition 有权

公开(公告)号：US11282525B2

公开(公告)日：2022-03-22

申请号：US17009494

申请日：2020-09-01

Applicant: Google LLC

Inventor： Assaf Hurwitz Michaely , Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/26 , G06F40/56 , G10L15/22

Abstract: A method includes receiving a speech input from a user and obtaining context metadata associated with the speech input. The method also includes generating a raw speech recognition result corresponding to the speech input and selecting a list of one or more denormalizers to apply to the generated raw speech recognition result based on the context metadata associated with the speech input. The generated raw speech recognition result includes normalized text. The method also includes denormalizing the generated raw speech recognition result into denormalized text by applying the list of the one or more denormalizers in sequence to the generated raw speech recognition result.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification