Speech recognition with sequence-to-sequence models

Invention Grant

US11145293B2 Speech recognition with sequence-to-sequence models 有权

Please log in to see more content

Patent Title: Speech recognition with sequence-to-sequence models
Application No.: US16516390

Application Date: 2019-07-19
Publication No.: US11145293B2

Publication Date: 2021-10-12
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger
Main IPC: G10L15/00
IPC: G10L15/00 ; G10L15/16 ; G10L15/22 ; G10L15/02 ; G06N3/08 ; G10L15/06 ; G10L25/30 ; G10L15/26

Speech recognition with sequence-to-sequence models

Abstract:

Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Public/Granted literature

US20200027444A1 SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS Public/Granted day:2020-01-23

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）