-
公开(公告)号:US20230237995A1
公开(公告)日:2023-07-27
申请号:US18194586
申请日:2023-03-31
申请人: Google LLC
发明人: Rohit Prakash Prabhavalkar , Tara N. Sainath , Younghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Kannan
IPC分类号: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22
CPC分类号: G10L15/197 , G10L15/16 , G10L15/063 , G10L15/02 , G10L15/22 , G10L2015/025
摘要: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.