-
公开(公告)号:US20230335117A1
公开(公告)日:2023-10-19
申请号:US18186872
申请日:2023-03-20
Applicant: Google LLC
Inventor: Shuo-yiin Chang , Guru Prakash Arumugam , Zelin Wu , Tara N. Sainath , Bo LI , Qiao Liang , Adam Stambler , Shyam Upadhyay , Manaal Faruqui , Trevor Strohman
CPC classification number: G10L15/16 , G10L15/22 , G10L15/063 , G10L2015/223
Abstract: A method includes receiving, as input to a speech recognition model, audio data corresponding to a spoken utterance. The method also includes performing, using the speech recognition model, speech recognition on the audio data by, at each of a plurality of time steps, encoding, using an audio encoder, the audio data corresponding to the spoken utterance into a corresponding audio encoding, and decoding, using a speech recognition joint network, the corresponding audio encoding into a probability distribution over possible output labels. At each of the plurality of time steps, the method also includes determining, using an intended query (IQ) joint network configured to receive a label history representation associated with a sequence of non-blank symbols output by a final softmax layer, an intended query decision indicating whether or not the spoken utterance includes a query intended for a digital assistant.
-
公开(公告)号:US10929754B2
公开(公告)日:2021-02-23
申请号:US16711172
申请日:2019-12-11
Applicant: Google LLC
Inventor: Shuo-yiin Chang , Bo Li , Gabor Simko , Maria Carolina Parada San Martin , Sean Matthew Shannon
Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.
-