CONTEXT AWARE BEAMFORMING OF AUDIO DATA

    公开(公告)号:US20220319498A1

    公开(公告)日:2022-10-06

    申请号:US17221220

    申请日:2021-04-02

    Applicant: Google LLC

    Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

    Small Footprint Multi-Channel Keyword Spotting

    公开(公告)号:US20240347051A1

    公开(公告)日:2024-10-17

    申请号:US18754462

    申请日:2024-06-26

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/285 H04R3/005 G10L2015/088

    Abstract: A method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. Each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. The method also includes generating, using sequentially-stacked SVDF layers, a probability score indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.

    Context aware beamforming of audio data

    公开(公告)号:US11798533B2

    公开(公告)日:2023-10-24

    申请号:US17221220

    申请日:2021-04-02

    Applicant: Google LLC

    Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

    Selective adaptation and utilization of noise reduction technique in invocation phrase detection

    公开(公告)号:US11417324B2

    公开(公告)日:2022-08-16

    申请号:US16886139

    申请日:2020-05-28

    Applicant: Google LLC

    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

    Small footprint multi-channel keyword spotting

    公开(公告)号:US12051406B2

    公开(公告)日:2024-07-30

    申请号:US17757260

    申请日:2020-01-15

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/285 H04R3/005 G10L2015/088

    Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).

    Cascade Architecture for Noise-Robust Keyword Spotting

    公开(公告)号:US20240242728A1

    公开(公告)日:2024-07-18

    申请号:US18619608

    申请日:2024-03-28

    Applicant: Google LLC

    Abstract: A method includes receiving, at a first processor of a user device, streaming multi-channel audio captured by an array of microphones, each channel including respective audio features. For each channel, the method also includes processing, by the first processor, using a first stage hotword detector, the respective audio features to determine whether a hotword is detected. When the first stage hotword detector detects the hotword, the method also includes the first processor providing chomped raw audio data to a second processor that processes, using a first noise cleaning algorithm, the chomped raw audio data to generate a clean monophonic audio chomp. The method also includes processing, by the second processor using a second stage hotword detector, the clean monophonic audio chomp to detect the hotword.

    Selective adaptation and utilization of noise reduction technique in invocation phrase detection

    公开(公告)号:US10706842B2

    公开(公告)日:2020-07-07

    申请号:US16609619

    申请日:2019-01-14

    Applicant: Google LLC

    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. Various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

    SELECTIVE ADAPTATION AND UTILIZATION OF NOISE REDUCTION TECHNIQUE IN INVOCATION PHRASE DETECTION

    公开(公告)号:US20200066263A1

    公开(公告)日:2020-02-27

    申请号:US16609619

    申请日:2019-01-14

    Applicant: Google LLC

    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Patent Agency Ranking