-
公开(公告)号:US20230298563A1
公开(公告)日:2023-09-21
申请号:US18186157
申请日:2023-03-18
申请人: Google LLC
发明人: Ke Hu , Tara N. Sainath , Yanzhang He , Rohit Prabhavalkar , Sepand Mavandadi , Weiran Wang , Trevor Strohman
CPC分类号: G10L13/08 , G10L15/16 , G10L15/063
摘要: A method of text-only and semi-supervised training for deliberation includes receiving training data including unspoken textual utterances that are each not paired with any corresponding spoken utterance of non-synthetic speech, and training a deliberation model that includes a text encoder and a deliberation decoder on the unspoken textual utterances. The method also includes receiving, at the trained deliberation model, first-pass hypotheses and non-causal acoustic embeddings. The first-pass hypotheses is generated by a recurrent neural network-transducer (RNN-T) decoder for the non-causal acoustic embeddings encoded by a non-causal encoder. The method also includes encoding, using the text encoder, the first-pass hypotheses generated by the RNN-T decoder, and generating, using the deliberation decoder attending to both the first-pass hypotheses and the non-causal acoustic embeddings, second-pass hypotheses.
-
2.
公开(公告)号:US20230306958A1
公开(公告)日:2023-09-28
申请号:US18188632
申请日:2023-03-23
申请人: Google LLC
发明人: Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Sepand Mavandadi , Shuo-yiin Chang , Parisa Haghani
CPC分类号: G10L15/005 , G10L15/16 , G10L15/063
摘要: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.
-
公开(公告)号:US20230298570A1
公开(公告)日:2023-09-21
申请号:US18187222
申请日:2023-03-21
申请人: Google LLC
发明人: Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach
CPC分类号: G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02
摘要: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.
-
-