Patent search ap:("Google LLC") AND inv:"Yuan Wang" Page 1

1.

发明授权
Advancing the use of text and speech in ASR pretraining with consistency and contrastive losses 有权

公开(公告)号：US12272363B2

公开(公告)日：2025-04-08

申请号：US17722264

申请日：2022-04-15

Applicant: Google LLC

Inventor： Andrew Rosenberg , Zhehuai Chen , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar , Yuan Wang , Yu Zhang

IPC: G10L19/00 , G10L13/02 , G10L15/06 , G10L15/26

Abstract: A method includes receiving training data that includes unspoken text utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. Each transcribed non-synthetic speech utterance is paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances, the un-transcribed non-synthetic speech utterances, and the transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

2.

发明申请
PRODUCING PERSONALIZED SELECTION OF APPLICATIONS FOR PRESENTATION ON WEB-BASED INTERFACE 审中-公开

公开(公告)号：US20180285448A1

公开(公告)日：2018-10-04

申请号：US15478970

申请日：2017-04-04

Applicant: Google LLC

Inventor： Chih-Chun Chia , Yuan Wang , Tiansheng Yao , Chun How Tan , Matthew MacMahon

IPC: G06F17/30 , G06N3/02 , G06N5/04

Abstract: A personalized selection of applications for presentation on a web-based interface can be produced. A first vector can represent one or more first words from a first query. A second query, including the one or more first words and one or more second words, can be transmitted in response to a first determination that a measure of similarity between the first vector and a second vector, which represents the one or more second words, is greater than a threshold. The second vector can be obtained from a knowledge base. A response to the second query can include an identification of a first application. A cluster of applications, including the first application and a second application, can be generated in response to a second determination of an existence of a relationship between the first application and the second application. The personalized selection of applications can be produced based on the cluster.

3.

发明申请
Supervised and Unsupervised Training with Contrastive Loss Over Sequences 有权

公开(公告)号：US20250166614A1

公开(公告)日：2025-05-22

申请号：US19034304

申请日：2025-01-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G06N3/0464 , G06N3/09

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

4.

发明授权
Supervised and unsupervised training with contrastive loss over sequences 有权

公开(公告)号：US12230249B2

公开(公告)日：2025-02-18

申请号：US17655903

申请日：2022-03-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G10L13/02 , G10L15/16 , G10L15/22

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

5.

发明申请
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search 有权

公开(公告)号：US20250006217A1

公开(公告)日：2025-01-02

申请号：US18344007

申请日：2023-06-29

Applicant: Google LLC

Inventor： Christopher Li , Kyle Scott Kastner , Yuan Wang , Zhehuai Chen , Andrew Maxwell Rosenberg , Heng Su , Qian Chen , Leonid Aleksandrovich Velikovich , Patrick Maxim Rondon , Diamantino Antonio Caseiro , Zelin Wu

IPC: G10L25/30 , G10L15/26

Abstract: A method includes receiving training data that includes a set of transcribed speech utterances where each respective transcribed speech utterance is paired with a corresponding transcription. For each respective transcribed speech utterance, the method includes generating an encoded audio representation and an encoded textual representation, generating a higher order audio feature representation for a corresponding encoded audio representation, generating a higher order textual feature representation for a corresponding encoded textual representation, and determining a loss for the respective transcribed speech utterance based on the higher order audio feature representation and the higher order textual feature representation. The method also includes training a speech encoder and a text encoder of a correction model based on the loss determined for each transcribed speech utterance of the set of transcribed speech utterances.

Patent Agency Ranking