-
公开(公告)号:US20170278513A1
公开(公告)日:2017-09-28
申请号:US15392122
申请日:2016-12-28
Applicant: Google Inc.
Inventor: Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson
IPC: G10L15/16 , G10L21/0224
CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
-
公开(公告)号:US20170092265A1
公开(公告)日:2017-03-30
申请号:US14987146
申请日:2016-01-04
Applicant: Google Inc.
Inventor: Tara N. Sainath , Ron J. Weiss , Kevin William Wilson
IPC: G10L15/16 , G10L15/34 , G10L19/06 , G10L19/008
CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G06N3/084 , G10L15/20 , G10L15/34 , G10L21/0208 , G10L2021/02082
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
-
公开(公告)号:US20160322055A1
公开(公告)日:2016-11-03
申请号:US15205321
申请日:2016-07-08
Applicant: Google Inc.
Inventor: Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani
IPC: G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30
CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005
Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.
Abstract translation: 方法,包括在计算机存储介质上编码的计算机程序,用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。 一方面,一种方法包括:接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出,其中多个滤波器具有在训练过程期间已经学习的参数,其共同训练多个滤波器并训练深度 神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。
-
公开(公告)号:US20160284347A1
公开(公告)日:2016-09-29
申请号:US15080927
申请日:2016-03-25
Applicant: Google Inc.
Inventor: Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson
CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G06N3/084 , G10L15/142 , G10L15/26
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于处理音频波形的计算机程序。 在一些实现中,基于音频数据生成时间 - 频率特征表示。 时频特征表示被输入到包括经训练的人造神经网络的声学模型。 经训练的人造神经网络包括频率卷积层,存储层和一个或多个隐藏层。 接收基于训练的人造神经网络的输出的输出。 提供转录,其中基于声学模型的输出确定转录。
-
公开(公告)号:US20180068675A1
公开(公告)日:2018-03-08
申请号:US15350293
申请日:2016-11-14
Applicant: Google Inc.
Inventor: Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan
IPC: G10L25/30 , G10L21/028 , G10L21/0388
CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166
Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
-
公开(公告)号:US09886949B2
公开(公告)日:2018-02-06
申请号:US15392122
申请日:2016-12-28
Applicant: Google Inc.
Inventor: Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson
IPC: G10L15/00 , G10L15/16 , G10L21/0224 , G10L21/0216 , G10L15/26
CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
-
公开(公告)号:US09697826B2
公开(公告)日:2017-07-04
申请号:US15205321
申请日:2016-07-08
Applicant: Google Inc.
Inventor: Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani
IPC: G10L15/16 , G10L15/06 , G10L21/0216 , G10L15/02
CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005
Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.
-
-
-
-
-
-