-
公开(公告)号:US11670287B2
公开(公告)日:2023-06-06
申请号:US17222939
申请日:2021-04-05
Applicant: Google LLC
Inventor: Aleksandar Kracun , Richard Cameron Rose
CPC classification number: G10L15/08 , G10L15/22 , G10L17/00 , G10L2015/088 , G10L2015/223 , G10L2015/228 , H04M3/568 , H04M2250/74
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.
-
公开(公告)号:US20220189466A1
公开(公告)日:2022-06-16
申请号:US17120033
申请日:2020-12-11
Applicant: Google LLC
Inventor: Matthew Sharifi , Aleksandar Kracun
Abstract: A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.
-
公开(公告)号:US20200098374A1
公开(公告)日:2020-03-26
申请号:US16552244
申请日:2019-08-27
Applicant: Google LLC
Inventor: Aleksandar Kracun , Richard Cameron Rose
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.
-
公开(公告)号:US20190287528A1
公开(公告)日:2019-09-19
申请号:US16362831
申请日:2019-03-25
Applicant: Google LLC
Inventor: Christopher Thaddeus Hughes , Ignacio Lopez Moreno , Aleksandar Kracun
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.
-
公开(公告)号:US10276161B2
公开(公告)日:2019-04-30
申请号:US15391358
申请日:2016-12-27
Applicant: Google LLC
Inventor: Christopher Thaddeus Hughes , Ignacio Lopez Moreno , Aleksandar Kracun
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.
-
公开(公告)号:US20190115029A1
公开(公告)日:2019-04-18
申请号:US15785751
申请日:2017-10-17
Applicant: Google LLC
Inventor: Aleksandar Kracun , Richard Cameron Rose
CPC classification number: G10L17/005 , G10L15/08 , G10L15/22 , G10L17/00 , G10L2015/088 , G10L2015/223 , G10L2015/228 , H04M3/568 , H04M2250/74
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.
-
-
-
-
-