-
公开(公告)号:US20240028829A1
公开(公告)日:2024-01-25
申请号:US18346232
申请日:2023-07-01
Applicant: Google LLC
Inventor: Tara N. Sainath , Zhouyuan Huo , Zhehuai Chen , Yu Zhang , Weiran Wang , Trevor Strohman , Rohit Prakash Prabhavalkar , Bo Li , Ankur Bapna
IPC: G06F40/284 , G06F40/40
CPC classification number: G06F40/284 , G06F40/40
Abstract: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.