End-to-end streaming keyword spotting

    公开(公告)号:US11967310B2

    公开(公告)日:2024-04-23

    申请号:US18322207

    申请日:2023-05-23

    Applicant: Google LLC

    Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

    End-to-End Streaming Keyword Spotting
    12.
    发明公开

    公开(公告)号:US20230162729A1

    公开(公告)日:2023-05-25

    申请号:US18151540

    申请日:2023-01-09

    Applicant: Google LLC

    Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

    End-to-End Streaming Keyword Spotting

    公开(公告)号:US20210312913A1

    公开(公告)日:2021-10-07

    申请号:US17348422

    申请日:2021-06-15

    Applicant: Google LLC

    Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

    End-to-End Streaming Keyword Spotting
    14.
    发明公开

    公开(公告)号:US20230298576A1

    公开(公告)日:2023-09-21

    申请号:US18322207

    申请日:2023-05-23

    Applicant: Google LLC

    Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

    Mixing Heterogeneous Loss Types to Improve Accuracy of Keyword Spotting

    公开(公告)号:US20230274731A1

    公开(公告)日:2023-08-31

    申请号:US17652801

    申请日:2022-02-28

    Applicant: Google LLC

    Abstract: A method for training a neural network includes receiving a training input audio sequence including a sequence of input frames defining a hotword that initiates a wake-up process on a user device. The method further includes obtaining a first label and a second label for the training input audio sequence. The method includes generating, using a memorized neural network and the training input audio sequence, an output indicating a likelihood the training input audio sequence includes the hotword. The method further includes determining a first loss based on the first label and the output. The method includes determining a second loss based on the second label and the output. The method further includes optimizing the memorized neural network based on the first loss and the second loss associated with the training input audio sequence.

    End-to-End Streaming Keyword Spotting

    公开(公告)号:US20210142790A1

    公开(公告)日:2021-05-13

    申请号:US17155068

    申请日:2021-01-21

    Applicant: Google LLC

    Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

    End-to-end streaming keyword spotting

    公开(公告)号:US10930269B2

    公开(公告)日:2021-02-23

    申请号:US16439897

    申请日:2019-06-13

    Applicant: Google LLC

    Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

    End-to-End Streaming Keyword Spotting
    18.
    发明申请

    公开(公告)号:US20200020322A1

    公开(公告)日:2020-01-16

    申请号:US16439897

    申请日:2019-06-13

    Applicant: Google LLC

    Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Patent Agency Ranking