-
公开(公告)号:US20190115013A1
公开(公告)日:2019-04-18
申请号:US16171629
申请日:2018-10-26
Applicant: Google LLC
Inventor: Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Michiel A.U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran
CPC classification number: G10L15/16 , G10H1/00 , G10H2210/036 , G10H2210/046 , G10H2250/235 , G10H2250/311 , G10L15/02 , G10L17/18 , G10L19/0212
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
-
公开(公告)号:US20240420701A1
公开(公告)日:2024-12-19
申请号:US18744592
申请日:2024-06-14
Applicant: Google LLC
Inventor: Laurent El Shafey , Hagen Soltau , Izhak Shafran
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio data using neural networks.
-
公开(公告)号:US20220199094A1
公开(公告)日:2022-06-23
申请号:US17601662
申请日:2020-04-06
Applicant: Google LLC
Inventor: Laurent El Shafey , Hagen Soltau , Izhak Shafran
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio data using neural networks.
-
公开(公告)号:US11069344B2
公开(公告)日:2021-07-20
申请号:US16710005
申请日:2019-12-11
Applicant: Google LLC
Inventor: Izhak Shafran , Thomas E. Bagby , Russell John Wyatt Skerry-Ryan
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.
-
公开(公告)号:US10762914B2
公开(公告)日:2020-09-01
申请号:US16032996
申请日:2018-07-11
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
公开(公告)号:US10714078B2
公开(公告)日:2020-07-14
申请号:US16171629
申请日:2018-10-26
Applicant: Google LLC
Inventor: Samuel Bengio , Mirkó Visontai , Christopher Walter George Thornton , Michiel A. U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
-
公开(公告)号:US20190272840A1
公开(公告)日:2019-09-05
申请号:US16032996
申请日:2018-07-11
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G10L15/06 , G06F3/16 , G06N3/02 , G06F17/14
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
-
-
-
-
-