-
公开(公告)号:US20200335091A1
公开(公告)日:2020-10-22
申请号:US16809403
申请日:2020-03-04
Applicant: Google LLC
Inventor: Shuo-yiin Chang , Rohit Prakash Prabhavalkar , Gabor Simko , Tara N. Sainath , Bo Li , Yangzhang He
Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.
-
公开(公告)号:US20200043483A1
公开(公告)日:2020-02-06
申请号:US16529252
申请日:2019-08-01
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan
IPC: G10L15/197 , G10L15/16 , G10L15/22 , G10L15/06 , G10L15/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.
-
公开(公告)号:US20200027444A1
公开(公告)日:2020-01-23
申请号:US16516390
申请日:2019-07-19
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
-