Processing multi-channel audio waveforms

Invention Grant

US09697826B2 Processing multi-channel audio waveforms 有权

Please log in to see more content

Patent Title: Processing multi-channel audio waveforms
Application No.: US15205321

Application Date: 2016-07-08
Publication No.: US09697826B2

Publication Date: 2017-07-04
Inventor: Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani
Applicant: Google Inc.
Applicant Address: US CA Mountain View
Assignee: Google Inc.
Current Assignee: Google Inc.
Current Assignee Address: US CA Mountain View
Agency: Fish & Richardson P.C.
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L15/06 ; G10L21/0216 ; G10L15/02

Processing multi-channel audio waveforms

Abstract:

Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Public/Granted literature

US20160322055A1 PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS Public/Granted day:2016-11-03

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络