ENHANCED MULTI-CHANNEL ACOUSTIC MODELS
    1.
    发明申请

    公开(公告)号:US20180068675A1

    公开(公告)日:2018-03-08

    申请号:US15350293

    申请日:2016-11-14

    Applicant: Google Inc.

    Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

    PROCESSING AUDIO WAVEFORMS
    2.
    发明申请
    PROCESSING AUDIO WAVEFORMS 审中-公开
    处理音频波形

    公开(公告)号:US20160284347A1

    公开(公告)日:2016-09-29

    申请号:US15080927

    申请日:2016-03-25

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于处理音频波形的计算机程序。 在一些实现中,基于音频数据生成时间 - 频率特征表示。 时频特征表示被输入到包括经训练的人造神经网络的声学模型。 经训练的人造神经网络包括频率卷积层,存储层和一个或多个隐藏层。 接收基于训练的人造神经网络的输出的输出。 提供转录,其中基于声学模型的输出确定转录。

    ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

    公开(公告)号:US20170278513A1

    公开(公告)日:2017-09-28

    申请号:US15392122

    申请日:2016-12-28

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    MULTICHANNEL RAW-WAVEFORM NEURAL NETWORKS
    4.
    发明申请

    公开(公告)号:US20170092265A1

    公开(公告)日:2017-03-30

    申请号:US14987146

    申请日:2016-01-04

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

    PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS
    5.
    发明申请
    PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
    处理多通道音频波形

    公开(公告)号:US20160322055A1

    公开(公告)日:2016-11-03

    申请号:US15205321

    申请日:2016-07-08

    Applicant: Google Inc.

    Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

    Abstract translation: 方法,包括在计算机存储介质上编码的计算机程序,用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。 一方面,一种方法包括:接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出,其中多个滤波器具有在训练过程期间已经学习的参数,其共同训练多个滤波器并训练深度 神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

    LOW LATENCY VIDEO STORYBOARD DELIVERY WITH SELECTABLE RESOLUTION LEVELS
    6.
    发明申请
    LOW LATENCY VIDEO STORYBOARD DELIVERY WITH SELECTABLE RESOLUTION LEVELS 审中-公开
    低可用视频故事板交付与可选择的分辨率水平

    公开(公告)号:US20140082661A1

    公开(公告)日:2014-03-20

    申请号:US13785913

    申请日:2013-03-05

    Applicant: Google Inc.

    Abstract: A video storyboard delivery system is disclosed. The system receives, from a playback client executed on a user device, a request for a video including one or more user device parameters. The system obtains a storyboard manifest including information defining a storyboard associated with the video, wherein the information includes a plurality of storyboard resolution levels. Using the one or more user device parameters, a selection is made of one of the plurality of storyboard resolution levels from the storyboard manifest. The storyboard at the selected resolution level is delivered to the playback client.

    Abstract translation: 公开了视频故事板传送系统。 系统从在用户设备上执行的播放客户端接收对包括一个或多个用户设备参数的视频的请求。 该系统获得包括定义与视频相关联的故事板的信息的故事板清单,其中所述信息包括多个故事板分辨率级别。 使用一个或多个用户设备参数,从故事板清单中选择多个故事板分辨率级别中的一个。 所选分辨率级别的故事板传送到播放客户端。

    Adaptive audio enhancement for multichannel speech recognition

    公开(公告)号:US09886949B2

    公开(公告)日:2018-02-06

    申请号:US15392122

    申请日:2016-12-28

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Patent Agency Ranking