-
公开(公告)号:US11699453B2
公开(公告)日:2023-07-11
申请号:US17005823
申请日:2020-08-28
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216
CPC classification number: G10L21/0208 , G06F3/167 , G06F17/142 , G06N3/02 , G10L15/063 , G10L15/065 , G10L15/20 , G10L15/22 , G10L2015/223 , G10L2021/02082 , G10L2021/02166
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
2.
公开(公告)号:US20230298612A1
公开(公告)日:2023-09-21
申请号:US18171411
申请日:2023-02-20
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Tom O'malley
CPC classification number: G10L21/0232 , G10L25/30 , H04S3/008 , G10L15/22 , G10L15/063 , G10L15/16 , G10L25/18 , H04S2400/01 , G10L2021/02082
Abstract: A multichannel neural frontend speech enhancement model for speech recognition includes a speech cleaner, a stack of self-attention blocks each having a multi-headed self attention mechanism, and a masking layer. The speech cleaner receives, as input, a multichannel noisy input signal and a multichannel contextual noise signal, and generates, as output, a single channel cleaned input signal. The stack of self-attention blocks receives, as input, at an initial block of the stack of self-attention blocks, a stacked input including the single channel cleaned input signal and a single channel noisy input signal, and generates, as output, from a final block of the stack of self-attention blocks, an un-masked output. The masking layer receives, as input, the single channel noisy input signal and the un-masked output, and generates, as output, enhanced input speech features corresponding to a target utterance.
-
公开(公告)号:US10762914B2
公开(公告)日:2020-09-01
申请号:US16032996
申请日:2018-07-11
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
公开(公告)号:US20190272840A1
公开(公告)日:2019-09-05
申请号:US16032996
申请日:2018-07-11
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G10L15/06 , G06F3/16 , G06N3/02 , G06F17/14
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
-
-