Patent search ap:("Google LLC") AND inv:"Fadi Biadsy" Page 3

21.

发明授权
Direct speech-to-speech translation via machine learning 有权

公开(公告)号：US12032920B2

公开(公告)日：2024-07-09

申请号：US17056554

申请日：2020-03-07

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Melvin Johnson , Fadi Biadsy , Ron Weiss , Wolfgang Macherey

IPC: G06F40/47 , G06F40/58

CPC classification number: G06F40/47 , G06F40/58

Abstract: The present disclosure provides systems and methods that train and use machine-learned models such as, for example, sequence-to-sequence models, to perform direct and text-free speech-to-speech translation. In particular, aspects of the present disclosure provide an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

22.

发明公开
Sub-models for Neural Contextual Biasing with Attention and Embedding Space 审中-公开

公开(公告)号：US20240021190A1

公开(公告)日：2024-01-18

申请号：US17813322

申请日：2022-07-18

Applicant: Google LLC

Inventor： Fadi Biadsy , Pedro Jose Moreno Mengibar

IPC: G10L15/06 , G10L15/16 , G10L13/02 , G10L15/22

CPC classification number: G10L15/063 , G10L15/16 , G10L13/02 , G10L15/22 , G10L2015/0635

Abstract: A method for training a sub-model for contextual biasing for speech recognition includes obtaining a base speech recognition model trained on non-biased data. The method includes obtaining a set of training utterances representative of a particular domain, each training utterance in the set of training utterances including audio data characterizing the training utterances and a ground truth transcription of the training utterance. The method further includes, for each corresponding training utterance in the set of training utterances, determining, using an embedding encoder, a corresponding document embedding from the ground truth transcription of the corresponding training utterance. The method includes training, using the corresponding document embeddings determined from the ground truth transcriptions of the set of training utterances, a sub-model to bias the base speech recognition model to recognize speech in the particular domain.

23.

发明授权
Language models using domain-specific model components 有权

公开(公告)号：US11875789B2

公开(公告)日：2024-01-16

申请号：US18069070

申请日：2022-12-20

Applicant: Google LLC

Inventor： Fadi Biadsy , Diamantino Antonio Caseiro

IPC: G10L15/18 , G10L15/197 , G10L15/02 , G10L15/32 , G10L15/08 , G10L15/183 , G10L15/19 , G10L15/22

CPC classification number: G10L15/197 , G10L15/02 , G10L15/08 , G10L15/32 , G10L15/183 , G10L15/19 , G10L2015/226 , G10L2015/228

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

24.

发明公开
Scalable Model Specialization Framework for Speech Model Personalization 审中-公开

公开(公告)号：US20230298574A1

公开(公告)日：2023-09-21

申请号：US18184630

申请日：2023-03-15

Applicant: Google LLC

Inventor： Fadi Biadsy , Youzheng Chen , Xia Zhang , Oleg Rybakov , Andrew M. Rosenberg , Pedro J.Moreno Mengibar

IPC: G10L15/16 , G10L15/06 , G10L15/02

CPC classification number: G10L15/16 , G10L15/063 , G10L15/02 , G10L2015/025

Abstract: A method for speech conversion includes obtaining a speech conversion model configured to convert input utterances of human speech directly into corresponding output utterances of synthesized speech. The method further includes receiving a speech conversion request including input audio data corresponding to an utterance spoken by a target speaker associated with atypical speech and a speaker identifier uniquely identifying the target speaker. The method includes activating, using the speaker identifier, a particular sub-model for biasing the speech conversion model to recognize a type of the atypical speech associated with the target speaker identified by the speaker identifier. The method includes converting, using the speech conversion model biased by the activated particular sub-model, the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into output audio data corresponding to a synthesized canonical fluent speech representation of the utterance spoken by the target speaker.

25.

发明授权
Speech recognition 有权

公开(公告)号：US11580994B2

公开(公告)日：2023-02-14

申请号：US17153495

申请日：2021-01-20

Applicant: Google LLC

Inventor： Fadi Biadsy , Pedro Jose Moreno Mengibar

IPC: G10L17/22 , G10L17/02 , G10L17/04 , G10L17/14 , G10L15/22 , G10L17/26

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

26.

发明授权
Speech recognition using log-linear model 有权

公开(公告)号：US10134394B2

公开(公告)日：2018-11-20

申请号：US14708465

申请日：2015-05-11

Applicant: GOOGLE LLC

Inventor： Diamantino Antonio Caseiro , Fadi Biadsy

IPC: G10L15/197 , G06F17/27

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.

27.

发明授权
Conformer-based speech conversion model 有权

公开(公告)号：US12272348B2

公开(公告)日：2025-04-08

申请号：US17655030

申请日：2022-03-16

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Zhehuai Chen , Fadi Biadsy , Pedro J. Moreno Mengibar

IPC: G10L13/027 , G10L13/047 , G10L15/16 , G10L15/22 , G10L25/18

Abstract: A method for speech conversion includes receiving, as input to an encoder of a speech conversion model, an input spectrogram corresponding to an utterance, the encoder including a stack of self-attention blocks. The method further includes generating, as output from the encoder, an encoded spectrogram and receiving, as input to a spectrogram decoder of the speech conversion model, the encoded spectrogram generated as output from the encoder. The method further includes generating, as output from the spectrogram decoder, an output spectrogram corresponding to a synthesized speech representation of the utterance.

28.

发明申请
SPEAKER EMBEDDINGS FOR IMPROVED AUTOMATIC SPEECH RECOGNITION 有权

公开(公告)号：US20250037700A1

公开(公告)日：2025-01-30

申请号：US18919366

申请日：2024-10-17

Applicant: Google LLC

Inventor： Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats

IPC: G10L13/08 , G10L13/04 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/18

Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

29.

发明授权
Speaker embeddings for improved automatic speech recognition 有权

公开(公告)号：US12136410B2

公开(公告)日：2024-11-05

申请号：US17661832

申请日：2022-05-03

Applicant: Google LLC

Inventor： Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats

IPC: G10L13/08 , G10L13/04 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/18

Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

30.

发明公开
Using Non-Parallel Voice Conversion for Speech Conversion Models 审中-公开

公开(公告)号：US20230298565A1

公开(公告)日：2023-09-21

申请号：US17660487

申请日：2022-04-25

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy

IPC: G10L15/06 , G10L15/197 , G10L13/02 , G10L19/038 , G10L15/22

CPC classification number: G10L15/063 , G10L15/197 , G10L13/02 , G10L19/038 , G10L15/22 , G10L2015/0635 , G10L2019/0001

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification