Patent search ap:("Google LLC") AND inv:"Fadi Biadsy" Page 2

11.

发明授权
Synthesized data augmentation using voice conversion and speech recognition models 有权

公开(公告)号：US11335324B2

公开(公告)日：2022-05-17

申请号：US17008278

申请日：2020-08-31

Applicant: Google LLC

Inventor： Fadi Biadsy , Liyang Jiang , Pedro J. Moreno Mengibar , Andrew Rosenberg

IPC: G10L15/06 , G10L15/16 , G10L13/04 , G10L13/047 , G10L15/22 , G10L13/08

Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

12.

发明申请
END-TO-END SPEECH CONVERSION 有权

公开(公告)号：US20220122579A1

公开(公告)日：2022-04-21

申请号：US17310732

申请日：2019-11-26

Applicant: Google LLC

Inventor： Fadi Biadsy , Ron J. Weiss , Aleksandar Kracun , Pedro J. Moreno Mengibar

IPC: G10L13/02 , G10L21/10 , G10L25/30 , G06N3/08 , H04L51/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

13.

发明公开
Streaming Speech-to-speech Model With Automatic Speaker Turn Detection 审中-公开

公开(公告)号：US20230395061A1

公开(公告)日：2023-12-07

申请号：US18319410

申请日：2023-05-17

Applicant: Google LLC

Inventor： Fadi Biadsy , Oleg Rybakov

IPC: G10L13/047 , G10L15/30 , G10L15/04 , G10L15/16

CPC classification number: G10L13/047 , G10L15/30 , G10L15/04 , G10L15/16

Abstract: A method for turn detection in a speech-to-speech model includes receiving, as input to the speech-to-speech (S2S) model, a sequence of acoustic frames corresponding to an utterance. The method further includes, at each of a plurality of output steps, generating, by an audio encoder of the S2S model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames, and determining, by a turn detector of the S2S model, based on the higher order feature representation generated by the audio encoder at the corresponding output step, whether the utterance is at a breakpoint at the corresponding output step. When the turn detector determines that the utterance is at the breakpoint, the method includes synthesizing a sequence of output audio frames output by a speech decoder of the S2S model into a time-domain audio waveform of synthesized speech representing the utterance spoken by the user.

14.

发明公开
END-TO-END SPEECH CONVERSION 审中-公开

公开(公告)号：US20230230572A1

公开(公告)日：2023-07-20

申请号：US18188524

申请日：2023-03-23

Applicant: Google LLC

Inventor： Fadi Biadsy , Ron J. Weiss , Aleksandar Kracun , Pedro J. Moreno Mengibar

IPC: G10L13/02 , G06N3/08 , G10L21/10 , G10L25/30 , H04L51/02

CPC classification number: G10L13/02 , G06N3/08 , G10L21/10 , G10L25/30 , H04L51/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

15.

发明公开
SPEECH RECOGNITION 审中-公开

公开(公告)号：US20230169983A1

公开(公告)日：2023-06-01

申请号：US18159601

申请日：2023-01-25

Applicant: Google LLC

Inventor： Fadi Biadsy , Pedro J. Moreno Mengibar

IPC: G10L17/22 , G10L17/02 , G10L17/04 , G10L17/14 , G10L15/22

CPC classification number: G10L17/22 , G10L17/02 , G10L17/04 , G10L17/14 , G10L15/22 , G10L17/26

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

16.

发明申请
Training Speech Synthesis to Generate Distinct Speech Sounds 有权

公开(公告)号：US20230009613A1

公开(公告)日：2023-01-12

申请号：US17756995

申请日：2019-12-13

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Fadi Biadsy , Yu Zhang

IPC: G10L13/047 , G10L13/08 , G10L15/16 , G10L15/06

Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

17.

发明申请
FACTOR GRAPH FOR SEMANTIC PARSING 审中-公开

公开(公告)号：US20190244610A1

公开(公告)日：2019-08-08

申请号：US16257856

申请日：2019-01-25

Applicant: Google LLC

Inventor： Fadi Biadsy , Pedro J. Moreno Mengibar

IPC: G10L15/22

CPC classification number: G10L15/22 , G10L2015/223

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating expressions associated with voice commands. The methods, systems, and apparatus include actions of obtaining segments of one or more expressions associated with a voice command. Further actions include combining the segments into a candidate expression and scoring the candidate expression using a text corpus. Additional actions include selecting the candidate expression as an expression associated with the voice command based on the scoring of the candidate expression.

18.

发明授权
Sub-models for neural contextual biasing 有权

公开(公告)号：US12230258B2

公开(公告)日：2025-02-18

申请号：US17659836

申请日：2022-04-19

Applicant: Google LLC

Inventor： Fadi Biadsy , Pedro J. Moreno Mengibar

IPC: G10L15/183 , G06N3/04

Abstract: A method for contextual biasing for speech recognition includes obtaining a base automatic speech recognition (ASR) model trained on non-biased data and a sub-model trained on biased data representative of a particular domain. The method includes receiving a speech recognition request including audio data characterizing an utterance captured in streaming audio. The method further includes determining whether the speech recognition request includes a contextual indicator indicating the particular domain. When the speech recognition request does not include the contextual indicator, the method includes generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data. When the speech recognition request includes the contextual indicator the method includes biasing, using the sub-model, the base ASR model toward the particular domain and generating, using the biased base ASR model, a second speech recognition result of the utterance by processing the audio data.

19.

发明授权
Preventing non-transient storage of assistant interaction data and/or wiping of stored assistant interaction data 有权

公开(公告)号：US12205578B2

公开(公告)日：2025-01-21

申请号：US17788183

申请日：2021-01-07

Applicant: GOOGLE LLC

Inventor： Fadi Biadsy , Johan Schalkwyk , Jason Pelecanos

IPC: G10L15/22 , G10L15/18

Abstract: Implementations disclosed herein are directed to techniques for selectively enabling and/or disabling non-transient storage of one or more instances of assistant interaction data for turn(s) of a dialog between a user and an automated assistant. Implementations are additionally or alternatively directed to techniques for retroactive wiping of non-transiently stored assistant interaction data from previous assistant interaction(s).

20.

发明授权
Using non-parallel voice conversion for speech conversion models 有权

公开(公告)号：US12190862B2

公开(公告)日：2025-01-07

申请号：US17660487

申请日：2022-04-25

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy

IPC: G10L15/06 , G10L13/02 , G10L15/197 , G10L15/22 , G10L19/038 , G10L21/003 , G10L15/16 , G10L19/00

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification