Patent search ap:("Google LLC") AND inv:"Ehsan Variani" Page 1

1.

发明授权
Multilingual re-scoring models for automatic speech recognition 有权

公开(公告)号：US12254875B2

公开(公告)日：2025-03-18

申请号：US18589220

申请日：2024-02-27

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

2.

发明公开
Rare Word Recognition with LM-aware MWER Training 审中-公开

公开(公告)号：US20230298570A1

公开(公告)日：2023-09-21

申请号：US18187222

申请日：2023-03-21

Applicant: Google LLC

Inventor： Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach

IPC: G10L15/06 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

CPC classification number: G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.

3.

发明公开
EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL 审中-公开

公开(公告)号：US20230343328A1

公开(公告)日：2023-10-26

申请号：US18336211

申请日：2023-06-16

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

4.

发明申请
Multilingual Re-Scoring Models for Automatic Speech Recognition 有权

公开(公告)号：US20220310081A1

公开(公告)日：2022-09-29

申请号：US17701635

申请日：2022-03-22

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/16 , G10L15/22 , G10L15/00

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis, and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

5.

发明申请
Cascaded Encoders for Simplified Streaming and Non-Streaming ASR 有权

公开(公告)号：US20220122622A1

公开(公告)日：2022-04-21

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G10L15/00 , G10L25/30 , G06N3/08

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

6.

发明申请
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20200286468A1

公开(公告)日：2020-09-10

申请号：US16879322

申请日：2020-05-20

Applicant: Google LLC

Inventor： Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Tara N. Sainath , Ehsan Variani , Izhak Shafran , Michiel A.u. Bacchiani

IPC: G10L15/16 , G10L15/02 , G10H1/00 , G10L19/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

7.

发明申请
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20190115013A1

公开(公告)日：2019-04-18

申请号：US16171629

申请日：2018-10-26

Applicant: Google LLC

Inventor： Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Michiel A.U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran

IPC: G10L15/16 , G10L19/02 , G10L15/02 , G10H1/00

CPC classification number: G10L15/16 , G10H1/00 , G10H2210/036 , G10H2210/046 , G10H2250/235 , G10H2250/311 , G10L15/02 , G10L17/18 , G10L19/0212

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

8.

发明申请
Multilingual Re-Scoring Models for Automatic Speech Recognition 有权

公开(公告)号：US20240420692A1

公开(公告)日：2024-12-19

申请号：US18818010

申请日：2024-08-28

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

9.

发明授权
Enhanced multi-channel acoustic models 有权

公开(公告)号：US11783849B2

公开(公告)日：2023-10-10

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L15/16 , G10L25/30 , G10L21/028 , G10L21/0388 , G10L19/008 , G10L15/20 , G10L21/0208 , G10L21/0216

CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

10.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 审中-公开

公开(公告)号：US20190259409A1

公开(公告)日：2019-08-22

申请号：US16278830

申请日：2019-02-19

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L19/008 , G10L15/20 , G10L15/16 , G10L21/0388

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification