Low-rank hidden input layer for speech recognition neural network

    公开(公告)号:US09646634B2

    公开(公告)日:2017-05-09

    申请号:US14616881

    申请日:2015-02-09

    Applicant: Google Inc.

    CPC classification number: G10L25/30 G06N3/0454 G06N3/0481 G10L15/063

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods for training a deep neural network that includes a low rank hidden input layer and an adjoining hidden layer, the low rank hidden input layer including a first matrix A and a second matrix B with dimensions i×m and m×o, respectively, to identify a keyword includes receiving a feature vector including i values that represent features of an audio signal encoding an utterance, determining, using the low rank hidden input layer, an output vector including o values using the feature vector, determining, using the adjoining hidden layer, another vector using the output vector, determining a confidence score that indicates whether the utterance includes the keyword using the other vector, and adjusting weights for the low rank hidden input layer using the confidence score.

    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR
    13.
    发明申请
    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR 有权
    用户指定的关键字使用长时间记忆神经网络特征提取器

    公开(公告)号:US20170076717A1

    公开(公告)日:2017-03-16

    申请号:US15345982

    申请日:2016-11-08

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS
    14.
    发明申请
    CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS 审中-公开
    连续长时间的记忆,完全连接的深层神经网络

    公开(公告)号:US20160099010A1

    公开(公告)日:2016-04-07

    申请号:US14847133

    申请日:2015-09-08

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于识别口语发音的语言。 其中一种方法包括接收话音的输入特征; 以及使用包括一个或多个卷积神经网络(CNN)层,一个或多个长短期存储网络(LSTM)层和一个或多个完全连接的神经网络层的声学模型来处理输入特征,以产生用于 说话。

    Voice Activity Detection
    15.
    发明申请

    公开(公告)号:US20170092297A1

    公开(公告)日:2017-03-30

    申请号:US14986985

    申请日:2016-01-04

    Applicant: Google Inc.

    CPC classification number: G10L25/78 G10L25/30

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting voice activity. In one aspect, a method include actions of receiving, by a neural network included in an automated voice activity detection system, a raw audio waveform, processing, by the neural network, the raw audio waveform to determine whether the audio waveform includes speech, and provide, by the neural network, a classification of the raw audio waveform indicating whether the raw audio waveform includes speech.

    COMPRESSED RECURRENT NEURAL NETWORK MODELS
    16.
    发明申请
    COMPRESSED RECURRENT NEURAL NETWORK MODELS 审中-公开
    压缩性循环神经网络模型

    公开(公告)号:US20170076196A1

    公开(公告)日:2017-03-16

    申请号:US15172457

    申请日:2016-06-03

    Applicant: Google Inc.

    CPC classification number: G06N3/084 G06N3/0445

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于实现具有压缩门控功能的长短期存储器层。 系统中的一个包括第一长的短期存储器(LSTM)层,其中第一LSTM层被配置为,对于多个时间步骤中的每一个,通过应用多个时间步骤生成新的层状态和新的层输出 门到当前层输入,当前层状态和当前层输出,多个门中的每一个被配置为:对于多个时间步长中的每一个,通过将门输入向量 和门参数矩阵。 多个栅极中的至少一个门的栅极参数矩阵是一个结构化矩阵,或者由压缩参数矩阵和投影矩阵来定义。

    PROCESSING AUDIO WAVEFORMS
    17.
    发明申请
    PROCESSING AUDIO WAVEFORMS 审中-公开
    处理音频波形

    公开(公告)号:US20160284347A1

    公开(公告)日:2016-09-29

    申请号:US15080927

    申请日:2016-03-25

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于处理音频波形的计算机程序。 在一些实现中,基于音频数据生成时间 - 频率特征表示。 时频特征表示被输入到包括经训练的人造神经网络的声学模型。 经训练的人造神经网络包括频率卷积层,存储层和一个或多个隐藏层。 接收基于训练的人造神经网络的输出的输出。 提供转录,其中基于声学模型的输出确定转录。

    ENHANCED MULTI-CHANNEL ACOUSTIC MODELS
    18.
    发明申请

    公开(公告)号:US20180068675A1

    公开(公告)日:2018-03-08

    申请号:US15350293

    申请日:2016-11-14

    Applicant: Google Inc.

    Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

    LOW-RANK HIDDEN INPUT LAYER FOR SPEECH RECOGNITION NEURAL NETWORK
    19.
    发明申请
    LOW-RANK HIDDEN INPUT LAYER FOR SPEECH RECOGNITION NEURAL NETWORK 有权
    低位隐藏输入层用于语音识别神经网络

    公开(公告)号:US20160092766A1

    公开(公告)日:2016-03-31

    申请号:US14616881

    申请日:2015-02-09

    Applicant: Google Inc.

    CPC classification number: G10L25/30 G06N3/0454 G06N3/0481 G10L15/063

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods for training a deep neural network that includes a low rank hidden input layer and an adjoining hidden layer, the low rank hidden input layer including a first matrix A and a second matrix B with dimensions i×m and m×o, respectively, to identify a keyword includes receiving a feature vector including i values that represent features of an audio signal encoding an utterance, determining, using the low rank hidden input layer, an output vector including o values using the feature vector, determining, using the adjoining hidden layer, another vector using the output vector, determining a confidence score that indicates whether the utterance includes the keyword using the other vector, and adjusting weights for the low rank hidden input layer using the confidence score.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练深层神经网络。 训练深层神经网络的方法之一包括低级隐含输入层和邻接隐层,低级隐含输入层包括第一矩阵A和尺寸为i×m和m×o的第二矩阵B, 分别用于识别关键字包括接收包括表示编码话语的音频信号的特征的i值的特征向量,使用所述特征向量来确定使用所述低级隐藏输入层的包括o值的输出向量, 使用输出向量的另一向量,确定指示该话语是否包括使用另一向量的关键词的置信度分数,以及使用置信度分数来调整低级隐藏输入层的权重。

Patent Agency Ranking