ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

    公开(公告)号:US20170278513A1

    公开(公告)日:2017-09-28

    申请号:US15392122

    申请日:2016-12-28

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    MULTICHANNEL RAW-WAVEFORM NEURAL NETWORKS
    2.
    发明申请

    公开(公告)号:US20170092265A1

    公开(公告)日:2017-03-30

    申请号:US14987146

    申请日:2016-01-04

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

    User specified keyword spotting using long short term memory neural network feature extractor
    3.
    发明授权
    User specified keyword spotting using long short term memory neural network feature extractor 有权
    用户指定关键词使用长期记忆神经网络特征提取器

    公开(公告)号:US09508340B2

    公开(公告)日:2016-11-29

    申请号:US14579603

    申请日:2014-12-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS
    4.
    发明申请
    PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
    处理多通道音频波形

    公开(公告)号:US20160322055A1

    公开(公告)日:2016-11-03

    申请号:US15205321

    申请日:2016-07-08

    Applicant: Google Inc.

    Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

    Abstract translation: 方法,包括在计算机存储介质上编码的计算机程序,用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。 一方面,一种方法包括:接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出,其中多个滤波器具有在训练过程期间已经学习的参数,其共同训练多个滤波器并训练深度 神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR
    5.
    发明申请
    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR 有权
    用户指定的关键字使用长时间记忆神经网络特征提取器

    公开(公告)号:US20160180838A1

    公开(公告)日:2016-06-23

    申请号:US14579603

    申请日:2014-12-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    Adaptive audio enhancement for multichannel speech recognition

    公开(公告)号:US09886949B2

    公开(公告)日:2018-02-06

    申请号:US15392122

    申请日:2016-12-28

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    SPEAKER RECOGNITION USING NEURAL NETWORKS
    9.
    发明申请
    SPEAKER RECOGNITION USING NEURAL NETWORKS 审中-公开
    使用神经网络的扬声器识别

    公开(公告)号:US20160293167A1

    公开(公告)日:2016-10-06

    申请号:US15179717

    申请日:2016-06-10

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker verification. In one aspect, a method includes accessing a neural network having an input layer that provides inputs to a first hidden layer whose nodes are respectively connected to only a proper subset of the inputs from the input layer. Speech data that corresponds to a particular utterance may be provided as input to the input layer of the neural network. A representation of activations that occur in response to the speech data at a particular layer of the neural network that was configured as a hidden layer during training of the neural network may be generated. A determination of whether the particular utterance was likely spoken by a particular speaker may be made based at least on the generated representation. An indication of whether the particular utterance was likely spoken by the particular speaker may be provided.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话者验证的计算机程序。 一方面,一种方法包括访问具有输入层的神经网络,所述输入层向第一隐藏层提供输入,所述第一隐藏层的节点仅分别连接到来自输入层的输入的适当子集。 可以将对应于特定话语的语音数据提供给神经网络的输入层的输入。 可以生成在神经网络的训练期间被配置为隐藏层的神经网络的特定层响应于语音数据而发生的激活的表示。 可以至少基于所生成的表示来确定特定说话者是否可能说出特定话语的确定。 可以提供特定说话者是否可能说出特定话语的指示。

    CONVOLUTIONAL NEURAL NETWORKS
    10.
    发明申请
    CONVOLUTIONAL NEURAL NETWORKS 审中-公开
    CONVOLUTIONAL神经网络

    公开(公告)号:US20160283841A1

    公开(公告)日:2016-09-29

    申请号:US14805704

    申请日:2015-07-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for keyword spotting. One of the methods includes training, by a keyword detection system, a convolutional neural network for keyword detection by providing a two-dimensional set of input values to the convolutional neural network, the input values including a first dimension in time and a second dimension in frequency, and performing convolutional multiplication on the two-dimensional set of input values for a filter using a frequency stride greater than one to generate a feature map.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于关键字识别。 方法之一包括通过关键字检测系统对卷积神经网络提供二维输入值集合来进行关键词检测的卷积神经网络,所述输入值包括时间上的第一维度和第二维度 频率和对使用大于1的频率步幅的滤波器的二维输入值集进行卷积乘法以生成特征图。

Patent Agency Ranking