-
公开(公告)号:US20240428056A1
公开(公告)日:2024-12-26
申请号:US18750973
申请日:2024-06-21
Applicant: Google LLC
Inventor: Paul Kishan Rubenstein , Matthew Sharifi , Alexandru Tudor , Chulayuth Asawaroengchai , Duc Dung Nguyen , Marco Tagliasacchi , Neil Zeghidour , Zalán Borsos , Christian Frank , Dalia Salem Hassan Fahmy Elbadawy , Hannah Raphaelle Muckenhirn , Dirk Ryan Padfield , Damien Vincent , Evgeny Kharitonov , Michelle Dana Tadmor , Mihajlo Velimirovic , Feifan Chen , Victoria Zayats
IPC: G06N3/0475 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing tasks. One of the methods includes obtaining a sequence of input tokens, where each token is selected from a vocabulary of tokens that includes text tokens and audio tokens, and wherein the sequence of input tokens includes tokens that describe a task to be performed and data for performing the task; generating a sequence of embeddings by embedding each token in the sequence of input tokens in an embedding space; and processing the sequence of embeddings using a language model neural network to generate a sequence of output tokens for the task, where each token is selected from the vocabulary.
-
公开(公告)号:US20240386885A1
公开(公告)日:2024-11-21
申请号:US18662442
申请日:2024-05-13
Applicant: Google LLC
Inventor: Michelle Dana Tadmor , Eliya Nachmani , Alon Levkovitch , Julian Salazar , Chulayuth Asawaroengchai , Russell John Wyatt Skerry-Ryan , Soroosh Mariooryad
IPC: G10L15/183 , G10L13/027 , G10L15/02 , G10L15/06 , G10L25/18
Abstract: A method includes receiving an input sequence of speech features characterizing a spoken prompt. The method also includes generating a corresponding sequence of audio encodings using an audio encoder of a spoken language model. Without applying any intermediary cross-attention to the sequence of audio encoding between the audio encoder and a language model decoder of the spoken language model, the method includes processing the sequence of audio encodings generated by the audio encoder using the language model decoder to generate an output sequence of speech features characterizing a continuation of the spoken prompt.
-
公开(公告)号:US20240289563A1
公开(公告)日:2024-08-29
申请号:US18589358
申请日:2024-02-27
Applicant: GOOGLE LLC
Inventor: Michelle Tadmor Ramanovich , Eliya Nachmani , Alon Levkovitch , Byungha Chun , Yifan Ding , Nadav Bar , Chulayuth Asawaroengchai
CPC classification number: G06F40/58 , G10L15/005 , G10L15/063 , G10L25/18 , G10L2015/0635
Abstract: Training and/or utilizing a Speech-To-Speech Translation (S2ST) system that can be used to generate, based on processing source audio data that captures a spoken utterance in a source language, target audio data that includes a synthetic spoken utterance that is spoken in a target language and that corresponds, both linguistically and para-linguistically, to the spoken utterance in the source language. Implementations that are directed to training the S2ST system utilize an unsupervised approach, with monolingual speech data, in training the S2ST system.
-
-