ADAPTIVE MULTICHANNEL DEREVERBERATION FOR AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20190272840A1

    公开(公告)日:2019-09-05

    申请号:US16032996

    申请日:2018-07-11

    Applicant: Google LLC

    Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

    EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL

    公开(公告)号:US20240371363A1

    公开(公告)日:2024-11-07

    申请号:US18772263

    申请日:2024-07-15

    Applicant: Google LLC

    Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

    STFT-based echo muter
    36.
    发明授权

    公开(公告)号:US12051434B2

    公开(公告)日:2024-07-30

    申请号:US17643825

    申请日:2021-12-11

    Applicant: Google LLC

    CPC classification number: G10L21/0224 G10L21/0232 G10L2021/02082

    Abstract: A method for Short-Time Fourier Transform-based echo muting includes receiving a microphone signal including acoustic echo captured by a microphone and corresponding to audio content from an acoustic speaker, and receiving a reference signal including a sequence of frames representing the audio content. For each frame in a sequence of frames, the method includes processing, using an acoustic echo canceler configured to receive a respective frame as input to generate a respective output signal frame that cancels the acoustic echo from the respective frame, and determining, using a Double-talk Detector (DTD), based on the respective frame and the respective output signal frame, whether the respective frame includes a double-talk frame or an echo-only frame. For each respective frame that includes the echo-only frame, muting the respective output signal frame, and performing speech processing on the respective output signal frame for each respective frame that includes the double-talk frame.

Patent Agency Ranking