Patent search ap:("Google Inc.") AND inv:"Michiel A.U. Bacchiani" Page 1

1.

发明申请
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS 有权
Title translation: 神经网络序列训练的异步优化

公开(公告)号：US20150127337A1

公开(公告)日：2015-05-07

申请号：US14258139

申请日：2014-04-22

Applicant: Google Inc.

Inventor： Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A.U. Bacchiani

IPC: G10L15/06

CPC classification number: G10L15/063 , G06N3/0454 , G10L15/16 , G10L15/183

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于通过第一序列训练语音模型获得表示第一训练话语的语音特征的第一批训练帧; 通过所述第一序列训练语音模型获得一个或多个第一神经网络参数; 基于（i）第一批训练帧和（ii）所述一个或多个第一神经网络参数，通过所述第一序列训练语音模型确定一个或多个优化的第一神经网络参数; 通过第二序列训练语音模型获得表示第二训练语音的语音特征的第二批训练帧; 获得一个或多个第二神经网络参数; 以及通过所述第二序列训练语音模型，基于（i）第二批训练帧和（ii）所述一个或多个第二神经网络参数来确定一个或多个优化的第二神经网络参数。

2.

发明申请
CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK 有权
Title translation: 使用神经网络的背景相关状态

公开(公告)号：US20150127327A1

公开(公告)日：2015-05-07

申请号：US14282655

申请日：2014-05-20

Applicant: Google Inc.

Inventor： Michiel A.U. Bacchiani , David Rybach

IPC: G10L25/30 , G10L19/038 , G10L15/26

CPC classification number: G10L25/30 , G10L15/06 , G10L15/16 , G10L15/183 , G10L15/22 , G10L15/26

Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

Abstract translation: 本文描述的技术可以包括接收编码话音的一部分的音频信号并向第一神经网络提供对应于音频信号的数据的方法。该方法还包括基于第一神经网络的输出，由处理器生成表示用于话语的转录的数据。使用多个上下文相关状态的特征训练第一神经网络，所述上下文相关状态从由第二神经网络提供的多个与上下文无关的状态导出。

3.

发明申请
QUERY ENDPOINTING BASED ON LIP DETECTION 审中-公开

公开(公告)号：US20180268812A1

公开(公告)日：2018-09-20

申请号：US15458214

申请日：2017-03-14

Applicant: Google Inc.

Inventor： Chanwoo Kim , Rajeev Conrad Nongpiur , Michiel A.U. Bacchiani

IPC: G10L15/22 , G10L15/26 , G10L15/25 , G06K9/00

CPC classification number: G10L15/22 , G06K9/00255 , G10L15/04 , G10L15/25 , G10L15/265 , G10L25/78 , G10L2015/223

Abstract: Systems and methods are described for improving endpoint detection of a voice query submitted by a user. In some implementations, a synchronized video data and audio data is received. A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.

4.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 有权

公开(公告)号：US20170278513A1

公开(公告)日：2017-09-28

申请号：US15392122

申请日：2016-12-28

Applicant: Google Inc.

Inventor： Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/16 , G10L21/0224

CPC classification number: G10L15/16 , G10L15/20 , G10L15/26 , G10L21/0224 , G10L2021/02166

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

5.

发明申请
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
Title translation: 处理多通道音频波形

公开(公告)号：US20160322055A1

公开(公告)日：2016-11-03

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani

IPC: G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Abstract translation: 方法，包括在计算机存储介质上编码的计算机程序，用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。一方面，一种方法包括：接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出，其中多个滤波器具有在训练过程期间已经学习的参数，其共同训练多个滤波器并训练深度神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification