Patent search ap:("Google LLC") AND inv:"Chulayuth Asawaroengchai" Page 1

1.

发明申请
PERFORMING TASKS USING GENERATIVE NEURAL NETWORKS 有权

公开(公告)号：US20240428056A1

公开(公告)日：2024-12-26

申请号：US18750973

申请日：2024-06-21

Applicant: Google LLC

Inventor： Paul Kishan Rubenstein , Matthew Sharifi , Alexandru Tudor , Chulayuth Asawaroengchai , Duc Dung Nguyen , Marco Tagliasacchi , Neil Zeghidour , Zalán Borsos , Christian Frank , Dalia Salem Hassan Fahmy Elbadawy , Hannah Raphaelle Muckenhirn , Dirk Ryan Padfield , Damien Vincent , Evgeny Kharitonov , Michelle Dana Tadmor , Mihajlo Velimirovic , Feifan Chen , Victoria Zayats

IPC: G06N3/0475 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing tasks. One of the methods includes obtaining a sequence of input tokens, where each token is selected from a vocabulary of tokens that includes text tokens and audio tokens, and wherein the sequence of input tokens includes tokens that describe a task to be performed and data for performing the task; generating a sequence of embeddings by embedding each token in the sequence of input tokens in an embedding space; and processing the sequence of embeddings using a language model neural network to generate a sequence of output tokens for the task, where each token is selected from the vocabulary.

2.

发明申请
LANGUAGE MODELS USING SPOKEN LANGUAGE MODELING 有权

公开(公告)号：US20240386885A1

公开(公告)日：2024-11-21

申请号：US18662442

申请日：2024-05-13

Applicant: Google LLC

Inventor： Michelle Dana Tadmor , Eliya Nachmani , Alon Levkovitch , Julian Salazar , Chulayuth Asawaroengchai , Russell John Wyatt Skerry-Ryan , Soroosh Mariooryad

IPC: G10L15/183 , G10L13/027 , G10L15/02 , G10L15/06 , G10L25/18

Abstract: A method includes receiving an input sequence of speech features characterizing a spoken prompt. The method also includes generating a corresponding sequence of audio encodings using an audio encoder of a spoken language model. Without applying any intermediary cross-attention to the sequence of audio encoding between the audio encoder and a language model decoder of the spoken language model, the method includes processing the sequence of audio encodings generated by the audio encoder using the language model decoder to generate an output sequence of speech features characterizing a continuation of the spoken prompt.

3.

发明公开
SPEECH-TO-SPEECH TRANSLATION WITH MONOLINGUAL DATA 审中-公开

公开(公告)号：US20240289563A1

公开(公告)日：2024-08-29

申请号：US18589358

申请日：2024-02-27

Applicant: GOOGLE LLC

Inventor： Michelle Tadmor Ramanovich , Eliya Nachmani , Alon Levkovitch , Byungha Chun , Yifan Ding , Nadav Bar , Chulayuth Asawaroengchai

IPC: G06F40/58 , G10L15/00 , G10L15/06 , G10L25/18

CPC classification number: G06F40/58 , G10L15/005 , G10L15/063 , G10L25/18 , G10L2015/0635

Abstract: Training and/or utilizing a Speech-To-Speech Translation (S2ST) system that can be used to generate, based on processing source audio data that captures a spoken utterance in a source language, target audio data that includes a synthetic spoken utterance that is spoken in a target language and that corresponds, both linguistically and para-linguistically, to the spoken utterance in the source language. Implementations that are directed to training the S2ST system utilize an unsupervised approach, with monolingual speech data, in training the S2ST system.

Patent Agency Ranking