-
公开(公告)号:US20240347060A1
公开(公告)日:2024-10-17
申请号:US18750663
申请日:2024-06-21
申请人: GOOGLE LLC
发明人: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
CPC分类号: G10L15/22 , G10L15/05 , G10L15/1815 , G10L25/78 , G10L2015/088 , G10L2015/223
摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
2.
公开(公告)号:US20190102458A1
公开(公告)日:2019-04-04
申请号:US16148338
申请日:2018-10-01
申请人: Google LLC
发明人: Dominik Roblek , Blaise Aguera-Arcas , Tom Hume , Marvin Ritter , Brandon Barbello , Kevin Kilgour , Mihajlo Velimirovic , Christopher Walter George Thornton , Gabriel Taubman , James David Lyon , Jan Athaus , Katsiaryna Naliuka , Julian Odell , Matthew Sharifi , Beat Gfeller
摘要: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.
-
公开(公告)号:US20190102144A1
公开(公告)日:2019-04-04
申请号:US16148401
申请日:2018-10-01
申请人: Google LLC
发明人: Dominik Roblek , Blaise Aguera-Arcas , Tom Hume , Marvin Ritter , Brandon Barbello , Kevin Kilgour , Mihajlo Velimirovic , Christopher Walter George Thornton , Gabriel Taubman , James David Lyon , Jan Althaus , Katsiaryna Naliuka , Julian Odell , Matthew Sharifi , Beat Gfeller
摘要: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for indicating a reference song. A computing device stores reference song characterization data that identifies a plurality of audio characteristics for each reference song in a plurality of reference songs. The computing device receives digital audio data that represents audio recorded by a microphone, converts the digital audio data from time-domain format into frequency-domain format, and uses the digital audio data in the frequency-domain format in a music-characterization process. In response to determining that characterization values for the digital audio data are most relevant to characterization values for a particular reference song, the computing device outputs an indication of the particular reference song.
-
公开(公告)号:US11557293B2
公开(公告)日:2023-01-17
申请号:US17321994
申请日:2021-05-17
申请人: GOOGLE LLC
发明人: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
公开(公告)号:US20220262345A1
公开(公告)日:2022-08-18
申请号:US17662021
申请日:2022-05-04
申请人: Google LLC
发明人: Matthew Sharifi , Kevin Kilgour , Dominik Roblek , James Lin
摘要: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.
-
公开(公告)号:US11341954B2
公开(公告)日:2022-05-24
申请号:US16717518
申请日:2019-12-17
申请人: Google LLC
发明人: Matthew Sharifi , Kevin Kilgour , Dominik Roblek , James Lin
摘要: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.
-
公开(公告)号:US12057119B2
公开(公告)日:2024-08-06
申请号:US18092883
申请日:2023-01-03
申请人: GOOGLE LLC
发明人: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
CPC分类号: G10L15/22 , G10L15/05 , G10L15/1815 , G10L25/78 , G10L2015/088 , G10L2015/223
摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
公开(公告)号:US20230143177A1
公开(公告)日:2023-05-11
申请号:US18092883
申请日:2023-01-03
申请人: GOOGLE LLC
发明人: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
CPC分类号: G10L15/22 , G10L15/05 , G10L15/1815 , G10L25/78 , G10L2015/088
摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
公开(公告)号:US20220366903A1
公开(公告)日:2022-11-17
申请号:US17321994
申请日:2021-05-17
申请人: GOOGLE LLC
发明人: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
摘要: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
公开(公告)号:US20210183367A1
公开(公告)日:2021-06-17
申请号:US16717518
申请日:2019-12-17
申请人: Google LLC
发明人: Matthew Sharifi , Kevin Kilgour , Dominik Roblek , James Lin
摘要: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.
-
-
-
-
-
-
-
-
-