Patent search ap:("Google Inc.") AND inv:"Kevin William Wilson" Page 1

1.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 审中-公开

公开(公告)号：US20180068675A1

公开(公告)日：2018-03-08

申请号：US15350293

申请日：2016-11-14

Applicant: Google Inc.

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L21/0388

CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

2.

发明申请
PROCESSING AUDIO WAVEFORMS 审中-公开
Title translation: 处理音频波形

公开(公告)号：US20160284347A1

公开(公告)日：2016-09-29

申请号：US15080927

申请日：2016-03-25

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson

IPC: G10L15/16 , G10L15/14 , G10L15/26

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G06N3/084 , G10L15/142 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于处理音频波形的计算机程序。在一些实现中，基于音频数据生成时间 - 频率特征表示。时频特征表示被输入到包括经训练的人造神经网络的声学模型。经训练的人造神经网络包括频率卷积层，存储层和一个或多个隐藏层。接收基于训练的人造神经网络的输出的输出。提供转录，其中基于声学模型的输出确定转录。

3.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 有权

公开(公告)号：US20170278513A1

公开(公告)日：2017-09-28

申请号：US15392122

申请日：2016-12-28

Applicant: Google Inc.

Inventor： Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/16 , G10L21/0224

CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

4.

发明申请
MULTICHANNEL RAW-WAVEFORM NEURAL NETWORKS 审中-公开

公开(公告)号：US20170092265A1

公开(公告)日：2017-03-30

申请号：US14987146

申请日：2016-01-04

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson

IPC: G10L15/16 , G10L15/34 , G10L19/06 , G10L19/008

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G06N3/084 , G10L15/20 , G10L15/34 , G10L21/0208 , G10L2021/02082

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

5.

发明申请
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
Title translation: 处理多通道音频波形

公开(公告)号：US20160322055A1

公开(公告)日：2016-11-03

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani

IPC: G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Abstract translation: 方法，包括在计算机存储介质上编码的计算机程序，用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。一方面，一种方法包括：接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出，其中多个滤波器具有在训练过程期间已经学习的参数，其共同训练多个滤波器并训练深度神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

6.

发明申请
LOW LATENCY VIDEO STORYBOARD DELIVERY WITH SELECTABLE RESOLUTION LEVELS 审中-公开
Title translation: 低可用视频故事板交付与可选择的分辨率水平

公开(公告)号：US20140082661A1

公开(公告)日：2014-03-20

申请号：US13785913

申请日：2013-03-05

Applicant: Google Inc.

Inventor： Nils Oliver Krahnstoever , Kevin William Wilson

IPC: H04N21/81 , H04N21/435

CPC classification number: H04N21/8126 , G06K9/00765 , G11B27/34 , H04N21/234336 , H04N21/2358 , H04N21/435 , H04N21/47202 , H04N21/8549

Abstract: A video storyboard delivery system is disclosed. The system receives, from a playback client executed on a user device, a request for a video including one or more user device parameters. The system obtains a storyboard manifest including information defining a storyboard associated with the video, wherein the information includes a plurality of storyboard resolution levels. Using the one or more user device parameters, a selection is made of one of the plurality of storyboard resolution levels from the storyboard manifest. The storyboard at the selected resolution level is delivered to the playback client.

Abstract translation: 公开了视频故事板传送系统。系统从在用户设备上执行的播放客户端接收对包括一个或多个用户设备参数的视频的请求。该系统获得包括定义与视频相关联的故事板的信息的故事板清单，其中所述信息包括多个故事板分辨率级别。使用一个或多个用户设备参数，从故事板清单中选择多个故事板分辨率级别中的一个。所选分辨率级别的故事板传送到播放客户端。

7.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US09886949B2

公开(公告)日：2018-02-06

申请号：US15392122

申请日：2016-12-28

Applicant: Google Inc.

Inventor： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L21/0224 , G10L21/0216 , G10L15/26

CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

8.

发明授权
Processing multi-channel audio waveforms 有权

公开(公告)号：US09697826B2

公开(公告)日：2017-07-04

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani

IPC: G10L15/16 , G10L15/06 , G10L21/0216 , G10L15/02

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification