ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS
    1.
    发明申请
    ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS 有权
    神经网络序列训练的异步优化

    公开(公告)号:US20150127337A1

    公开(公告)日:2015-05-07

    申请号:US14258139

    申请日:2014-04-22

    Applicant: Google Inc.

    CPC classification number: G10L15/063 G06N3/0454 G10L15/16 G10L15/183

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于通过第一序列训练语音模型获得表示第一训练话语的语音特征的第一批训练帧; 通过所述第一序列训练语音模型获得一个或多个第一神经网络参数; 基于(i)第一批训练帧和(ii)所述一个或多个第一神经网络参数,通过所述第一序列训练语音模型确定一个或多个优化的第一神经网络参数; 通过第二序列训练语音模型获得表示第二训练语音的语音特征的第二批训练帧; 获得一个或多个第二神经网络参数; 以及通过所述第二序列训练语音模型,基于(i)第二批训练帧和(ii)所述一个或多个第二神经网络参数来确定一个或多个优化的第二神经网络参数。

    CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK
    2.
    发明申请
    CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK 有权
    使用神经网络的背景相关状态

    公开(公告)号:US20150127327A1

    公开(公告)日:2015-05-07

    申请号:US14282655

    申请日:2014-05-20

    Applicant: Google Inc.

    Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

    Abstract translation: 本文描述的技术可以包括接收编码话音的一部分的音频信号并向第一神经网络提供对应于音频信号的数据的方法。 该方法还包括基于第一神经网络的输出,由处理器生成表示用于话语的转录的数据。 使用多个上下文相关状态的特征训练第一神经网络,所述上下文相关状态从由第二神经网络提供的多个与上下文无关的状态导出。

    ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

    公开(公告)号:US20170278513A1

    公开(公告)日:2017-09-28

    申请号:US15392122

    申请日:2016-12-28

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS
    5.
    发明申请
    PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
    处理多通道音频波形

    公开(公告)号:US20160322055A1

    公开(公告)日:2016-11-03

    申请号:US15205321

    申请日:2016-07-08

    Applicant: Google Inc.

    Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

    Abstract translation: 方法,包括在计算机存储介质上编码的计算机程序,用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。 一方面,一种方法包括:接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出,其中多个滤波器具有在训练过程期间已经学习的参数,其共同训练多个滤波器并训练深度 神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

Patent Agency Ranking