-
公开(公告)号:US11521604B2
公开(公告)日:2022-12-06
申请号:US17011612
申请日:2020-09-03
Applicant: Google LLC
Inventor: Aleks Kracun , Niranjan Subrahmanya , Aishanee Shah
IPC: G10L15/22 , G10L15/197 , G10L15/06 , G10L15/08
Abstract: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.
-
公开(公告)号:US20230101572A1
公开(公告)日:2023-03-30
申请号:US18074691
申请日:2022-12-05
Applicant: GOOGLE LLC
Inventor: Aleks Kracun , Niranjan Subrahmanya , Aishanee Shah
IPC: G10L15/197 , G10L15/06 , G10L15/22
Abstract: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.
-
公开(公告)号:US20220284891A1
公开(公告)日:2022-09-08
申请号:US17190779
申请日:2021-03-03
Applicant: GOOGLE LLC
Inventor: Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya
IPC: G10L15/22 , G10L15/06 , G10L15/08 , G06K9/62 , G10L21/0208
Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.
-
公开(公告)号:US20240355324A1
公开(公告)日:2024-10-24
申请号:US18761117
申请日:2024-07-01
Applicant: GOOGLE LLC
Inventor: Aleks Kracun , Niranjan Subrahmanya , Aishanee Shah
IPC: G10L15/197 , G10L15/06 , G10L15/08 , G10L15/22
CPC classification number: G10L15/197 , G10L15/063 , G10L15/22 , G10L2015/088 , G10L2015/223
Abstract: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.
-
公开(公告)号:US20220068268A1
公开(公告)日:2022-03-03
申请号:US17011612
申请日:2020-09-03
Applicant: Google LLC
Inventor: Aleks Kracun , Niranjan Subrahmanya , Aishanee Shah
IPC: G10L15/197 , G10L15/06 , G10L15/22
Abstract: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.
-
公开(公告)号:US12027162B2
公开(公告)日:2024-07-02
申请号:US17190779
申请日:2021-03-03
Applicant: GOOGLE LLC
Inventor: Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya
IPC: G10L15/22 , G06F18/24 , G10L15/06 , G10L15/08 , G10L21/0208
CPC classification number: G10L15/22 , G06F18/24 , G10L15/063 , G10L15/08 , G10L21/0208 , G10L2015/088 , G10L2015/223 , G10L2021/02082 , G10L2021/02087
Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.
-
公开(公告)号:US12027160B2
公开(公告)日:2024-07-02
申请号:US18074691
申请日:2022-12-05
Applicant: GOOGLE LLC
Inventor: Aleks Kracun , Niranjan Subrahmanya , Aishanee Shah
IPC: G10L15/22 , G10L15/06 , G10L15/197 , G10L15/08
CPC classification number: G10L15/197 , G10L15/063 , G10L15/22 , G10L2015/088 , G10L2015/223
Abstract: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.
-
-
-
-
-
-