Patent search ap:("GOOGLE LLC") AND inv:"Arun Narayanan" Page 3

21.

发明申请
Streaming Automatic Speech Recognition With Non-Streaming Model Distillation 有权

公开(公告)号：US20220343894A1

公开(公告)日：2022-10-27

申请号：US17348118

申请日：2021-06-15

Applicant: Google LLC

Inventor： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC: G10L15/06 , G06N3/04 , G10L15/18 , G10L15/08

Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

22.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20240420687A1

公开(公告)日：2024-12-19

申请号：US18815537

申请日：2024-08-26

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/22

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

23.

发明授权
Cascaded encoders for simplified streaming and non-streaming ASR 有权

公开(公告)号：US12154581B2

公开(公告)日：2024-11-26

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G06N3/08 , G10L15/00 , G10L15/16 , G10L15/32 , G10L25/30

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

24.

发明授权
Joint acoustic echo cancelation, speech enhancement, and voice separation for automatic speech recognition 有权

公开(公告)号：US12119014B2

公开(公告)日：2024-10-15

申请号：US17644108

申请日：2021-12-14

Applicant: Google LLC

Inventor： Arun Narayanan , Tom O'malley , Quan Wang , Alex Park , James Walker , Nathan David Howard , Yanzhang He , Chung-Cheng Chiu

IPC: G10L21/0216 , G06N3/04 , G10L15/06 , G10L21/0208 , H04R3/04

CPC classification number: G10L21/0216 , G06N3/04 , G10L15/063 , H04R3/04 , G10L2021/02082

Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

25.

发明授权
Streaming automatic speech recognition with non-streaming model distillation 有权

公开(公告)号：US11804212B2

公开(公告)日：2023-10-31

申请号：US17348118

申请日：2021-06-15

Applicant: Google LLC

Inventor： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC: G10L15/06 , G10L15/08 , G10L15/18 , G06N3/04 , G06N3/045

CPC classification number: G10L15/063 , G06N3/045 , G10L15/083 , G10L15/18

Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

26.

发明授权
Context aware beamforming of audio data 有权

公开(公告)号：US11798533B2

公开(公告)日：2023-10-24

申请号：US17221220

申请日：2021-04-02

Applicant: Google LLC

Inventor： Joseph Caroselli, Jr. , Yiteng Huang , Arun Narayanan

IPC: G10L15/08 , G10L21/0216 , G06N20/00 , G10L15/05

CPC classification number: G10L15/083 , G06N20/00 , G10L15/05 , G10L21/0216 , G10L2015/088 , G10L2021/02166

Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

27.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US11715458B2

公开(公告)日：2023-08-01

申请号：US17316198

申请日：2021-05-10

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/00 , G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

28.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 有权

公开(公告)号：US20210295859A1

公开(公告)日：2021-09-23

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L21/0388 , G10L15/16 , G10L19/008 , G10L15/20

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

29.

发明申请
AUTOMATED CALLING SYSTEM 有权

公开(公告)号：US20210090570A1

公开(公告)日：2021-03-25

申请号：US16580726

申请日：2019-09-24

Applicant: Google LLC

Inventor： Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani

IPC: G10L15/26 , H04M3/428 , H04M1/02 , G10L15/32 , H04M3/51 , H04M1/663

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

30.

发明授权
Adaptive multichannel dereverberation for automatic speech recognition 有权

公开(公告)号：US10762914B2

公开(公告)日：2020-09-01

申请号：US16032996

申请日：2018-07-11

Applicant: Google LLC

Inventor： Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose

IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification