-
公开(公告)号:US20240386885A1
公开(公告)日:2024-11-21
申请号:US18662442
申请日:2024-05-13
Applicant: Google LLC
Inventor: Michelle Dana Tadmor , Eliya Nachmani , Alon Levkovitch , Julian Salazar , Chulayuth Asawaroengchai , Russell John Wyatt Skerry-Ryan , Soroosh Mariooryad
IPC: G10L15/183 , G10L13/027 , G10L15/02 , G10L15/06 , G10L25/18
Abstract: A method includes receiving an input sequence of speech features characterizing a spoken prompt. The method also includes generating a corresponding sequence of audio encodings using an audio encoder of a spoken language model. Without applying any intermediary cross-attention to the sequence of audio encoding between the audio encoder and a language model decoder of the spoken language model, the method includes processing the sequence of audio encodings generated by the audio encoder using the language model decoder to generate an output sequence of speech features characterizing a continuation of the spoken prompt.
-
公开(公告)号:US20240289563A1
公开(公告)日:2024-08-29
申请号:US18589358
申请日:2024-02-27
Applicant: GOOGLE LLC
Inventor: Michelle Tadmor Ramanovich , Eliya Nachmani , Alon Levkovitch , Byungha Chun , Yifan Ding , Nadav Bar , Chulayuth Asawaroengchai
CPC classification number: G06F40/58 , G10L15/005 , G10L15/063 , G10L25/18 , G10L2015/0635
Abstract: Training and/or utilizing a Speech-To-Speech Translation (S2ST) system that can be used to generate, based on processing source audio data that captures a spoken utterance in a source language, target audio data that includes a synthetic spoken utterance that is spoken in a target language and that corresponds, both linguistically and para-linguistically, to the spoken utterance in the source language. Implementations that are directed to training the S2ST system utilize an unsupervised approach, with monolingual speech data, in training the S2ST system.
-