SPEECH RECOGNITION WITH ACOUSTIC MODELS

Invention Application

US20160372119A1 SPEECH RECOGNITION WITH ACOUSTIC MODELS 有权

Title translation: 用声学模型进行语音识别

Please log in to see more content

Patent Title: SPEECH RECOGNITION WITH ACOUSTIC MODELS
Patent Title (中): 用声学模型进行语音识别
Application No.: US14983315

Application Date: 2015-12-29
Publication No.: US20160372119A1

Publication Date: 2016-12-22
Inventor: Hasim Sak , Andrew W. Senior
Applicant: Google Inc.
Main IPC: G10L17/18
IPC: G10L17/18 ; G10L17/02 ; G10L17/04

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: sub sampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

Abstract(Chinese):

方法，系统和装置，包括在计算机存储介质上编码的用于从声学序列学习发音的计算机程序。一种方法包括：在多个时间步长中的每个步骤处接收声学序列，代表发音的声学序列，以及包括多个声学数据帧序列的声学序列; 堆叠一个或多个声音数据帧以产生声学数据的修改帧序列; 通过包括一个或多个循环神经网络（RNN）层和最终CTC输出层的声学建模神经网络来处理声学数据的经修改的帧序列以产生神经网络输出，其中处理声学数据的经修改的帧序列包括：对声学数据的修改帧进行子采样; 并通过声学建模神经网络处理每个子采样的声学数据的修改帧。

Public/Granted literature

US09818410B2 Speech recognition with acoustic models Public/Granted day:2017-11-14

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证
G10L17/18	.人工神经网络，连接方法