Patent search ap:("Google Inc.") AND inv:"Arun Narayanan" Page 1

1.

发明申请
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
Title translation: 处理多通道音频波形

公开(公告)号：US20160322055A1

公开(公告)日：2016-11-03

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani

IPC: G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Abstract translation: 方法，包括在计算机存储介质上编码的计算机程序，用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。一方面，一种方法包括：接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出，其中多个滤波器具有在训练过程期间已经学习的参数，其共同训练多个滤波器并训练深度神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

2.

发明授权
Sound source estimation using neural networks 有权

公开(公告)号：US10063965B2

公开(公告)日：2018-08-28

申请号：US15170348

申请日：2016-06-01

Applicant: Google Inc.

Inventor： Chanwoo Kim , Rajeev Conrad Nongpiur , Arun Narayanan

IPC: G10L25/30 , H04R3/00 , H04R5/027

CPC classification number: H04R3/005 , G01S5/18 , G10L25/30 , H04R5/027 , H04R2201/401 , H04R2430/20 , H04S2400/11 , H04S2400/15 , H04S2420/01

Abstract: A system for estimating the location of a stationary or moving sound source includes multiple microphones, which need not be physically aligned in a linear array or a regular geometric pattern in a given environment, an auralizer that generates auralized multi-channel signals based at least on array-related transfer functions and room impulse responses of the microphones as well as signal labels corresponding to the auralized multi-channel signals, a feature extractor that extracts features from the auralized multi-channel signals for efficient processing, and a neural network that can be trained to estimate the location of the sound source based at least on the features extracted from the auralized multi-channel signals and the corresponding signal labels.

3.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 审中-公开

公开(公告)号：US20180068675A1

公开(公告)日：2018-03-08

申请号：US15350293

申请日：2016-11-14

Applicant: Google Inc.

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L21/0388

CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

4.

发明申请
SOUND SOURCE ESTIMATION USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20170353789A1

公开(公告)日：2017-12-07

申请号：US15170348

申请日：2016-06-01

Applicant: Google Inc.

Inventor： Chanwoo Kim , Rajeev Conrad Nongpiur , Arun Narayanan

IPC: H04R3/00 , G10L25/30 , H04R5/027

CPC classification number: H04R3/005 , G10L25/30 , H04R5/027 , H04R2201/401 , H04R2430/20 , H04S2400/11 , H04S2400/15 , H04S2420/01

Abstract: A system for estimating the location of a stationary or moving sound source includes multiple microphones, which need not be physically aligned in a linear array or a regular geometric pattern in a given environment, an auralizer that generates auralized multi-channel signals based at least on array-related transfer functions and room impulse responses of the microphones as well as signal labels corresponding to the auralized multi-channel signals, a feature extractor that extracts features from the auralized multi-channel signals for efficient processing, and a neural network that can be trained to estimate the location of the sound source based at least on the features extracted from the auralized multi-channel signals and the corresponding signal labels.

5.

发明授权
Processing multi-channel audio waveforms 有权

公开(公告)号：US09697826B2

公开(公告)日：2017-07-04

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani

IPC: G10L15/16 , G10L15/06 , G10L21/0216 , G10L15/02

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification