-
公开(公告)号:US12051406B2
公开(公告)日:2024-07-30
申请号:US17757260
申请日:2020-01-15
Applicant: Google LLC
Inventor: Jilong Wu , Yiteng Huang
CPC classification number: G10L15/16 , G10L15/285 , H04R3/005 , G10L2015/088
Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).
-
公开(公告)号:US20240347051A1
公开(公告)日:2024-10-17
申请号:US18754462
申请日:2024-06-26
Applicant: Google LLC
Inventor: Jilong Wu , Yiteng Huang
CPC classification number: G10L15/16 , G10L15/285 , H04R3/005 , G10L2015/088
Abstract: A method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. Each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. The method also includes generating, using sequentially-stacked SVDF layers, a probability score indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.
-
公开(公告)号:US20230022800A1
公开(公告)日:2023-01-26
申请号:US17757260
申请日:2020-01-15
Applicant: Google LLC
Inventor: Jilong Wu , Yiteng Huang
Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).
-
-