Patent search ap:("Google LLC") AND inv:"Quan Wang" Page 5

41.

发明公开
SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20240112667A1

公开(公告)日：2024-04-04

申请号：US18525475

申请日：2023-11-30

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00

CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

42.

发明授权
Hybrid multilingual text-dependent and text-independent speaker verification 有权

公开(公告)号：US11942094B2

公开(公告)日：2024-03-26

申请号：US17211791

申请日：2021-03-24

Applicant: Google LLC

Inventor： Roza Chojnacka , Jason Pelecanos , Quan Wang , Ignacio Lopez Moreno

IPC: G10L17/02 , G06F16/9032 , G10L15/08

CPC classification number: G10L17/02 , G06F16/90332 , G10L2015/088

Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing a first portion of the audio data that characterizes a predetermined hotword to generate a text-dependent evaluation vector, and generating one or more text-dependent confidence scores. When one of the text-dependent confidence scores satisfies a threshold, the operations include identifying a speaker of the utterance as a respective enrolled user associated with the text-dependent confidence score that satisfies the threshold and initiating performance of an action without performing speaker verification. When none of the text-dependent confidence scores satisfy the threshold, the operations include processing a second portion of the audio data that characterizes a query to generate a text-independent evaluation vector, generating one or more text-independent confidence scores, and determining whether the identity of the speaker of the utterance includes any of the enrolled users.

43.

发明授权
Targeted voice separation by speaker conditioned on spectrogram masking 有权

公开(公告)号：US11922951B2

公开(公告)日：2024-03-05

申请号：US17567590

申请日：2022-01-03

Applicant: GOOGLE LLC

Inventor： Quan Wang , Prashant Sridhar , Ignacio Lopez Moreno , Hannah Muckenhirn

IPC: G10L17/04 , G10L17/00 , G10L17/02 , G10L17/18 , G10L17/22 , G10L25/18

CPC classification number: G10L17/04 , G10L17/00 , G10L17/02 , G10L17/18 , G10L17/22 , G10L25/18

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

44.

发明公开
Optimizing Personal VAD for On-Device Speech Recognition 审中-公开

公开(公告)号：US20230298591A1

公开(公告)日：2023-09-21

申请号：US18123060

申请日：2023-03-17

Applicant: Google LLC

Inventor： Shaojin Ding , Rajeev Rikhye , Qiao Liang , Yanzhang He , Quan Wang , Arun Narayanan , Tom O'Malley , Ian McGraw

IPC: G10L17/06 , G10L17/22

CPC classification number: G10L17/06 , G10L17/22

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.

45.

发明公开
VOICE SHORTCUT DETECTION WITH SPEAKER VERIFICATION 审中-公开

公开(公告)号：US20230169984A1

公开(公告)日：2023-06-01

申请号：US18103324

申请日：2023-01-30

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L17/06 , G10L21/028

CPC classification number: G10L17/24 , G10L17/06 , G10L21/028

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

46.

发明申请
Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering 有权

公开(公告)号：US20230089308A1

公开(公告)日：2023-03-23

申请号：US17644261

申请日：2021-12-14

Applicant: Google LLC

Inventor： Quan Wang , Han Lu , Evan Clark , Ignacio Lopez Moreno , Hasim Sak , Wei Xia , Taral Joglekar , Anshuman Tripathi

IPC: G10L15/26 , G10L15/16 , G10L15/06

Abstract: A method includes receiving an input audio signal that corresponds to utterances spoken by multiple speakers. The method also includes processing the input audio to generate a transcription of the utterances and a sequence of speaker turn tokens each indicating a location of a respective speaker turn. The method also includes segmenting the input audio signal into a plurality of speaker segments based on the sequence of speaker tokens. The method also includes extracting a speaker-discriminative embedding from each speaker segment and performing spectral clustering on the speaker-discriminative embeddings to cluster the plurality of speaker segments into k classes. The method also includes assigning a respective speaker label to each speaker segment clustered into the respective class that is different than the respective speaker label assigned to the speaker segments clustered into each other class of the k classes.

47.

发明申请
Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition 有权

公开(公告)号：US20230038982A1

公开(公告)日：2023-02-09

申请号：US17644108

申请日：2021-12-14

Applicant: Google LLC

Inventor： Arun Narayanan , Tom O'malley , Quan Wang , Alex Park , James Walker , Nathan David Howard , Yanzhang He , Chung-Cheng Chiu

IPC: G10L21/0216 , G10L15/06 , H04R3/04 , G06N3/04

Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

48.

发明申请
Attentive Scoring Function for Speaker Identification 有权

公开(公告)号：US20220366914A1

公开(公告)日：2022-11-17

申请号：US17302926

申请日：2021-05-16

Applicant: Google LLC

Inventor： Ignacio Lopez Moreno , Quan Wang , Jason Pelecanos , Yiling Huang , Mert Saglam

IPC: G10L17/06 , G10L17/18 , G10L17/04 , G06F16/245 , G06N3/08

Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.

49.

发明申请
VOICE SHORTCUT DETECTION WITH SPEAKER VERIFICATION 有权

公开(公告)号：US20220335953A1

公开(公告)日：2022-10-20

申请号：US17233253

申请日：2021-04-16

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L21/028 , G10L17/06

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

50.

发明授权
Speaker identification accuracy 有权

公开(公告)号：US11468900B2

公开(公告)日：2022-10-11

申请号：US17071223

申请日：2020-10-15

Applicant: Google LLC

Inventor： Yeming Fang , Quan Wang , Pedro Jose Moreno Mengibar , Ignacio Lopez Moreno , Gang Feng , Fang Chu , Jin Shi , Jason William Pelecanos

IPC: G10L17/02 , G10L17/06

Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification