-
公开(公告)号:US20220319498A1
公开(公告)日:2022-10-06
申请号:US17221220
申请日:2021-04-02
Applicant: Google LLC
Inventor: Joseph Caroselli, JR. , Yiteng Huang , Arun Narayanan
IPC: G10L15/08 , G10L21/0216 , G10L15/05 , G06N20/00
Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.
-
公开(公告)号:US20240347051A1
公开(公告)日:2024-10-17
申请号:US18754462
申请日:2024-06-26
Applicant: Google LLC
Inventor: Jilong Wu , Yiteng Huang
CPC classification number: G10L15/16 , G10L15/285 , H04R3/005 , G10L2015/088
Abstract: A method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. Each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. The method also includes generating, using sequentially-stacked SVDF layers, a probability score indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.
-
公开(公告)号:US11798533B2
公开(公告)日:2023-10-24
申请号:US17221220
申请日:2021-04-02
Applicant: Google LLC
Inventor: Joseph Caroselli, Jr. , Yiteng Huang , Arun Narayanan
IPC: G10L15/08 , G10L21/0216 , G06N20/00 , G10L15/05
CPC classification number: G10L15/083 , G06N20/00 , G10L15/05 , G10L21/0216 , G10L2015/088 , G10L2021/02166
Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.
-
4.
公开(公告)号:US11984117B2
公开(公告)日:2024-05-14
申请号:US17886726
申请日:2022-08-12
Applicant: Google LLC
Inventor: Christopher Hughes , Yiteng Huang , Turaj Zakizadeh Shabestary , Taylor Applebaum
IPC: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0232 , G10L25/84 , G10L21/0216
CPC classification number: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0232 , G10L25/84 , G10L2015/025 , G10L2015/088 , G10L2015/223 , G10L2021/02166
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
-
5.
公开(公告)号:US11417324B2
公开(公告)日:2022-08-16
申请号:US16886139
申请日:2020-05-28
Applicant: Google LLC
Inventor: Christopher Hughes , Yiteng Huang , Turaj Zakizadeh Shabestary , Taylor Applebaum
IPC: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0232 , G10L25/84 , G10L21/0216
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
-
6.
公开(公告)号:US20240304187A1
公开(公告)日:2024-09-12
申请号:US18662334
申请日:2024-05-13
Applicant: GOOGLE LLC
Inventor: Christopher Hughes , Yiteng Huang , Turaj Zakizadeh Shabestary , Taylor Applebaum
IPC: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0216 , G10L21/0232 , G10L25/84
CPC classification number: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0232 , G10L25/84 , G10L2015/025 , G10L2015/088 , G10L2015/223 , G10L2021/02166
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
-
公开(公告)号:US12051406B2
公开(公告)日:2024-07-30
申请号:US17757260
申请日:2020-01-15
Applicant: Google LLC
Inventor: Jilong Wu , Yiteng Huang
CPC classification number: G10L15/16 , G10L15/285 , H04R3/005 , G10L2015/088
Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).
-
公开(公告)号:US20240242728A1
公开(公告)日:2024-07-18
申请号:US18619608
申请日:2024-03-28
Applicant: Google LLC
Inventor: Yiteng Huang , Alexander H. Gruenstein
IPC: G10L21/0216 , G10L15/08 , G10L15/22
CPC classification number: G10L21/0216 , G10L15/08 , G10L15/22 , G10L2015/088 , G10L2021/02166
Abstract: A method includes receiving, at a first processor of a user device, streaming multi-channel audio captured by an array of microphones, each channel including respective audio features. For each channel, the method also includes processing, by the first processor, using a first stage hotword detector, the respective audio features to determine whether a hotword is detected. When the first stage hotword detector detects the hotword, the method also includes the first processor providing chomped raw audio data to a second processor that processes, using a first noise cleaning algorithm, the chomped raw audio data to generate a clean monophonic audio chomp. The method also includes processing, by the second processor using a second stage hotword detector, the clean monophonic audio chomp to detect the hotword.
-
9.
公开(公告)号:US10706842B2
公开(公告)日:2020-07-07
申请号:US16609619
申请日:2019-01-14
Applicant: Google LLC
Inventor: Christopher Hughes , Yiteng Huang , Turaj Zakizadeh Shabestary , Taylor Applebaum
IPC: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/22 , G10L21/0232 , G10L25/84 , G10L21/0216
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. Various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
-
10.
公开(公告)号:US20200066263A1
公开(公告)日:2020-02-27
申请号:US16609619
申请日:2019-01-14
Applicant: Google LLC
Inventor: Christopher Hughes , Yiteng Huang , Turaj Zakizadeh Shabestary , Taylor Applebaum
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
-
-
-
-
-
-
-
-
-