Contextual suppression of assistant command(s)

    公开(公告)号:US11557293B2

    公开(公告)日:2023-01-17

    申请号:US17321994

    申请日:2021-05-17

    申请人: GOOGLE LLC

    摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.

    Training Keyword Spotters
    5.
    发明申请

    公开(公告)号:US20220262345A1

    公开(公告)日:2022-08-18

    申请号:US17662021

    申请日:2022-05-04

    申请人: Google LLC

    摘要: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

    Training keyword spotters
    6.
    发明授权

    公开(公告)号:US11341954B2

    公开(公告)日:2022-05-24

    申请号:US16717518

    申请日:2019-12-17

    申请人: Google LLC

    摘要: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

    CONTEXTUAL SUPPRESSION OF ASSISTANT COMMAND(S)

    公开(公告)号:US20220366903A1

    公开(公告)日:2022-11-17

    申请号:US17321994

    申请日:2021-05-17

    申请人: GOOGLE LLC

    摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.

    Training Keyword Spotters
    10.
    发明申请

    公开(公告)号:US20210183367A1

    公开(公告)日:2021-06-17

    申请号:US16717518

    申请日:2019-12-17

    申请人: Google LLC

    摘要: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.