专利检索 ap:("Google LLC") AND inv:"Sepand Mavandadi" 第 1 页

1.

发明公开
Deliberation by Text-Only and Semi-Supervised Training 审中-公开

公开(公告)号：US20230298563A1

公开(公告)日：2023-09-21

申请号：US18186157

申请日：2023-03-18

申请人： Google LLC

发明人： Ke Hu , Tara N. Sainath , Yanzhang He , Rohit Prabhavalkar , Sepand Mavandadi , Weiran Wang , Trevor Strohman

IPC分类号： G10L13/08 , G10L15/16 , G10L15/06

CPC分类号： G10L13/08 , G10L15/16 , G10L15/063

摘要： A method of text-only and semi-supervised training for deliberation includes receiving training data including unspoken textual utterances that are each not paired with any corresponding spoken utterance of non-synthetic speech, and training a deliberation model that includes a text encoder and a deliberation decoder on the unspoken textual utterances. The method also includes receiving, at the trained deliberation model, first-pass hypotheses and non-causal acoustic embeddings. The first-pass hypotheses is generated by a recurrent neural network-transducer (RNN-T) decoder for the non-causal acoustic embeddings encoded by a non-causal encoder. The method also includes encoding, using the text encoder, the first-pass hypotheses generated by the RNN-T decoder, and generating, using the deliberation decoder attending to both the first-pass hypotheses and the non-causal acoustic embeddings, second-pass hypotheses.

2.

发明公开
Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification 审中-公开

公开(公告)号：US20230306958A1

公开(公告)日：2023-09-28

申请号：US18188632

申请日：2023-03-23

申请人： Google LLC

发明人： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Sepand Mavandadi , Shuo-yiin Chang , Parisa Haghani

IPC分类号： G10L15/00 , G10L15/16 , G10L15/06

CPC分类号： G10L15/005 , G10L15/16 , G10L15/063

摘要： A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

3.

发明公开
Rare Word Recognition with LM-aware MWER Training 审中-公开

公开(公告)号：US20230298570A1

公开(公告)日：2023-09-21

申请号：US18187222

申请日：2023-03-21

申请人： Google LLC

发明人： Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach

IPC分类号： G10L15/06 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

CPC分类号： G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

摘要： A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类