Patent search ap:("Google Inc.") AND inv:"Yedid Hoshen" Page 1

1.

发明申请
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
Title translation: 处理多通道音频波形

公开(公告)号：US20160322055A1

公开(公告)日：2016-11-03

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani

IPC: G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Abstract translation: 方法，包括在计算机存储介质上编码的计算机程序，用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。一方面，一种方法包括：接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出，其中多个滤波器具有在训练过程期间已经学习的参数，其共同训练多个滤波器并训练深度神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

2.

发明授权
Processing multi-channel audio waveforms 有权

公开(公告)号：US09697826B2

公开(公告)日：2017-07-04

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani

IPC: G10L15/16 , G10L15/06 , G10L21/0216 , G10L15/02

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification