Patent search ap:("Google LLC") AND inv:"Victoria Zayats" Page 1

1.

发明公开
Speaker Embeddings for Improved Automatic Speech Recognition 审中-公开

公开(公告)号：US20230360632A1

公开(公告)日：2023-11-09

申请号：US17661832

申请日：2022-05-03

Applicant: Google LLC

Inventor： Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats

IPC: G10L13/08 , G10L25/18 , G10L15/22 , G10L15/26 , G10L15/06 , G10L13/04

CPC classification number: G10L13/08 , G10L25/18 , G10L15/22 , G10L15/26 , G10L15/063 , G10L13/04

Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

2.

发明申请
PERFORMING TASKS USING GENERATIVE NEURAL NETWORKS 有权

公开(公告)号：US20240428056A1

公开(公告)日：2024-12-26

申请号：US18750973

申请日：2024-06-21

Applicant: Google LLC

Inventor： Paul Kishan Rubenstein , Matthew Sharifi , Alexandru Tudor , Chulayuth Asawaroengchai , Duc Dung Nguyen , Marco Tagliasacchi , Neil Zeghidour , Zalán Borsos , Christian Frank , Dalia Salem Hassan Fahmy Elbadawy , Hannah Raphaelle Muckenhirn , Dirk Ryan Padfield , Damien Vincent , Evgeny Kharitonov , Michelle Dana Tadmor , Mihajlo Velimirovic , Feifan Chen , Victoria Zayats

IPC: G06N3/0475 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing tasks. One of the methods includes obtaining a sequence of input tokens, where each token is selected from a vocabulary of tokens that includes text tokens and audio tokens, and wherein the sequence of input tokens includes tokens that describe a task to be performed and data for performing the task; generating a sequence of embeddings by embedding each token in the sequence of input tokens in an embedding space; and processing the sequence of embeddings using a language model neural network to generate a sequence of output tokens for the task, where each token is selected from the vocabulary.

3.

发明申请
SPEAKER EMBEDDINGS FOR IMPROVED AUTOMATIC SPEECH RECOGNITION 有权

公开(公告)号：US20250037700A1

公开(公告)日：2025-01-30

申请号：US18919366

申请日：2024-10-17

Applicant: Google LLC

Inventor： Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats

IPC: G10L13/08 , G10L13/04 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/18

Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

4.

发明授权
Speaker embeddings for improved automatic speech recognition 有权

公开(公告)号：US12136410B2

公开(公告)日：2024-11-05

申请号：US17661832

申请日：2022-05-03

Applicant: Google LLC

Inventor： Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats

IPC: G10L13/08 , G10L13/04 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/18

Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

Patent Agency Ranking