Small footprint multi-channel keyword spotting

    公开(公告)号:US12051406B2

    公开(公告)日:2024-07-30

    申请号:US17757260

    申请日:2020-01-15

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/285 H04R3/005 G10L2015/088

    Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).

    Small Footprint Multi-Channel Keyword Spotting

    公开(公告)号:US20240347051A1

    公开(公告)日:2024-10-17

    申请号:US18754462

    申请日:2024-06-26

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/285 H04R3/005 G10L2015/088

    Abstract: A method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. Each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. The method also includes generating, using sequentially-stacked SVDF layers, a probability score indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.

    Small Footprint Multi-Channel Keyword Spotting

    公开(公告)号:US20230022800A1

    公开(公告)日:2023-01-26

    申请号:US17757260

    申请日:2020-01-15

    Applicant: Google LLC

    Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).

Patent Agency Ranking