Patent search ap:("Google LLC") AND inv:"Quan Wang" Page 7

61.

发明申请
Synthesis of Speech from Text in a Voice of a Target Speaker Using Neural Networks 有权

公开(公告)号：US20220351713A1

公开(公告)日：2022-11-03

申请号：US17813361

申请日：2022-07-19

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

62.

发明授权
Synthesis of speech from text in a voice of a target speaker using neural networks 有权

公开(公告)号：US11488575B2

公开(公告)日：2022-11-01

申请号：US17055951

申请日：2019-05-17

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L13/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

63.

发明授权
Textual echo cancellation 有权

公开(公告)号：US11482244B2

公开(公告)日：2022-10-25

申请号：US17199347

申请日：2021-03-11

Applicant: Google LLC

Inventor： Quan Wang

IPC: G10L21/02 , G10L25/30 , G10L21/0208 , G10L25/93 , G10L13/00 , G10L15/06

Abstract: A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.

64.

发明申请
TRAINING AND/OR USING A LANGUAGE SELECTION MODEL FOR AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE 有权

公开(公告)号：US20220328035A1

公开(公告)日：2022-10-13

申请号：US17846287

申请日：2022-06-22

Applicant: Google LLC

Inventor： Li Wan , Yang Yu , Prashant Sridhar , Ignacio Lopez Moreno , Quan Wang

IPC: G10L15/00

Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

65.

发明申请
Hybrid Multilingual Text-Dependent and Text-Independent Speaker Verification 有权

公开(公告)号：US20220310098A1

公开(公告)日：2022-09-29

申请号：US17211791

申请日：2021-03-24

Applicant: Google LLC

Inventor： Roza Chojnacka , Jason Pelecanos , Quan Wang , Ignacio Lopez Moreno

IPC: G10L17/02 , G06F16/9032

Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing a first portion of the audio data that characterizes a predetermined hotword to generate a text-dependent evaluation vector, and generating one or more text-dependent confidence scores. When one of the text-dependent confidence scores satisfies a threshold, the operations include identifying a speaker of the utterance as a respective enrolled user associated with the text-dependent confidence score that satisfies the threshold and initiating performance of an action without performing speaker verification. When none of the text-dependent confidence scores satisfy the threshold, the operations include processing a second portion of the audio data that characterizes a query to generate a text-independent evaluation vector, generating one or more text-independent confidence scores, and determining whether the identity of the speaker of the utterance includes any of the enrolled users.

66.

发明授权
Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance 有权

公开(公告)号：US11410641B2

公开(公告)日：2022-08-09

申请号：US16959037

申请日：2019-11-27

Applicant: Google LLC

Inventor： Li Wan , Yang Yu , Prashant Sridhar , Ignacio Lopez Moreno , Quan Wang

IPC: G10L15/00

Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

67.

发明申请
Textual Echo Cancellation 有权

公开(公告)号：US20210390975A1

公开(公告)日：2021-12-16

申请号：US17199347

申请日：2021-03-11

Applicant: Google LLC

Inventor： Quan Wang

IPC: G10L25/93 , G10L25/30 , G10L13/00 , G10L21/0208 , G10L15/06

Abstract: A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.

68.

发明申请
SPEAKER VERIFICATION 有权

公开(公告)号：US20210256981A1

公开(公告)日：2021-08-19

申请号：US17307704

申请日：2021-05-04

Applicant: Google LLC

Inventor： Ignacio Lopez Moreno , Li Wan , Quan Wang

IPC: G10L17/24 , G10L17/22 , G10L17/02 , G10L17/08 , G10L17/14 , G10L17/18

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

69.

发明授权
Speaker verification across locations, languages, and/or dialects 有权

公开(公告)号：US11017784B2

公开(公告)日：2021-05-25

申请号：US16557390

申请日：2019-08-30

Applicant: Google LLC

Inventor： Ignacio Lopez Moreno , Li Wan , Quan Wang

IPC: G10L17/24 , G10L17/22 , G10L17/02 , G10L17/08 , G10L17/14 , G10L17/18

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

70.

发明申请
SPEAKER VERIFICATION 审中-公开

公开(公告)号：US20190385619A1

公开(公告)日：2019-12-19

申请号：US16557390

申请日：2019-08-30

Applicant: Google LLC

Inventor： Ignacio Lopez Moreno , Li Wan , Quan Wang

IPC: G10L17/24 , G10L17/02 , G10L17/22 , G10L17/14 , G10L17/18 , G10L17/08

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification