Patent search ap:("Google LLC") AND inv:"David Rybach" Page 2

11.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20240321263A1

公开(公告)日：2024-09-26

申请号：US18680797

申请日：2024-05-31

Applicant: Google LLC

Inventor： Tara N. Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC: G10L15/06 , G10L25/30 , G10L25/78

CPC classification number: G10L15/063 , G10L25/30 , G10L25/78

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

12.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US12051404B2

公开(公告)日：2024-07-30

申请号：US18336211

申请日：2023-06-16

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/00 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

13.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20230206907A1

公开(公告)日：2023-06-29

申请号：US18167050

申请日：2023-02-09

Applicant: Google LLC

Inventor： Tara N Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC: G10L15/06 , G10L25/30 , G10L25/78

CPC classification number: G10L15/063 , G10L25/30 , G10L25/78

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

14.

发明申请
Efficient Streaming Non-Recurrent On-Device End-to-End Model 有权

公开(公告)号：US20220310062A1

公开(公告)日：2022-09-29

申请号：US17316198

申请日：2021-05-10

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yangzhang He , Ehsan Variani , Cyrill Allauzen , David Rybach , Ruorning Pang , Trevor Strohman

IPC: G10L15/06 , G10L15/02 , G10L15/30 , G10L15/22

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification