Patent search ap:("Google Inc.") AND inv:"Andrew W. Senior" Page 1

1.

发明授权
Multisensory speech detection 有权

公开(公告)号：US10020009B1

公开(公告)日：2018-07-10

申请号：US15392448

申请日：2016-12-28

Applicant: Google Inc.

Inventor： Dave Burke , Michael J. LeBeau , Konrad Gianno , Trausti T. Kristjansson , John Nicholas Jitkoff , Andrew W. Senior

IPC: G10L15/26 , G10L25/78 , G10L25/21 , G10L15/22 , G06F3/16 , G06F3/0346 , H04M1/725

CPC classification number: G10L25/78 , G06F3/0346 , G06F3/167 , G10L15/10 , G10L15/22 , G10L15/265 , G10L17/00 , G10L25/21 , H04M1/72569 , H04M2250/12 , H04M2250/74 , H04R1/08 , H04W4/026

Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

2.

发明授权
Context-dependent modeling of phonemes 有权

公开(公告)号：US09818409B2

公开(公告)日：2017-11-14

申请号：US14877673

申请日：2015-10-07

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Izhak Shafran

IPC: G10L15/00 , G10L17/00 , G10L15/14 , G10L13/00 , G10L17/14 , G10L15/02 , G10L15/16

CPC classification number: G10L17/14 , G06N3/0445 , G10L15/02 , G10L15/16 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

3.

发明申请
PROCESSING AUDIO WAVEFORMS 审中-公开
Title translation: 处理音频波形

公开(公告)号：US20160284347A1

公开(公告)日：2016-09-29

申请号：US15080927

申请日：2016-03-25

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson

IPC: G10L15/16 , G10L15/14 , G10L15/26

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G06N3/084 , G10L15/142 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于处理音频波形的计算机程序。在一些实现中，基于音频数据生成时间 - 频率特征表示。时频特征表示被输入到包括经训练的人造神经网络的声学模型。经训练的人造神经网络包括频率卷积层，存储层和一个或多个隐藏层。接收基于训练的人造神经网络的输出的输出。提供转录，其中基于声学模型的输出确定转录。

4.

发明申请
Multisensory Speech Detection 有权

公开(公告)号：US20150287423A1

公开(公告)日：2015-10-08

申请号：US14645802

申请日：2015-03-12

Applicant: Google Inc.

Inventor： Dave Burke , Michael J. LeBeau , Konrad Gianno , Trausti T. Kristjansson , John Nicholas Jitkoff , Andrew W. Senior

IPC: G10L25/78 , H04W4/02 , H04R1/08 , H04M1/725

CPC classification number: G10L25/78 , G06F3/0346 , G06F3/167 , G10L15/10 , G10L15/22 , G10L15/265 , G10L17/00 , G10L25/21 , H04M1/72569 , H04M2250/12 , H04M2250/74 , H04R1/08 , H04W4/026

Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

5.

发明申请
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS 有权
Title translation: 神经网络序列训练的异步优化

公开(公告)号：US20150127337A1

公开(公告)日：2015-05-07

申请号：US14258139

申请日：2014-04-22

Applicant: Google Inc.

Inventor： Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A.U. Bacchiani

IPC: G10L15/06

CPC classification number: G10L15/063 , G06N3/0454 , G10L15/16 , G10L15/183

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于通过第一序列训练语音模型获得表示第一训练话语的语音特征的第一批训练帧; 通过所述第一序列训练语音模型获得一个或多个第一神经网络参数; 基于（i）第一批训练帧和（ii）所述一个或多个第一神经网络参数，通过所述第一序列训练语音模型确定一个或多个优化的第一神经网络参数; 通过第二序列训练语音模型获得表示第二训练语音的语音特征的第二批训练帧; 获得一个或多个第二神经网络参数; 以及通过所述第二序列训练语音模型，基于（i）第二批训练帧和（ii）所述一个或多个第二神经网络参数来确定一个或多个优化的第二神经网络参数。

6.

发明授权
Caching speech recognition scores 有权

公开(公告)号：US09858922B2

公开(公告)日：2018-01-02

申请号：US14311557

申请日：2014-06-23

Applicant: Google Inc.

Inventor： Eugene Weinstein , Sanjiv Kumar , Ignacio L. Moreno , Andrew W. Senior , Nikhil Prasad Bhat

IPC: G10L15/08 , G10L15/28

CPC classification number: G10L15/08 , G10L15/285

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score.

7.

发明授权
Generating representations of acoustic sequences 有权

公开(公告)号：US09721562B2

公开(公告)日：2017-08-01

申请号：US14559113

申请日：2014-12-03

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/16 , G06N3/02 , G10L15/02 , G10L15/14

CPC classification number: G10L15/16 , G10L15/02 , G10L15/142 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

8.

发明授权
Processing multi-channel audio waveforms 有权

公开(公告)号：US09697826B2

公开(公告)日：2017-07-04

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani

IPC: G10L15/16 , G10L15/06 , G10L21/0216 , G10L15/02

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

9.

发明申请
PROCESSING ACOUSTIC SEQUENCES USING LONG SHORT-TERM MEMORY (LSTM) NEURAL NETWORKS THAT INCLUDE RECURRENT PROJECTION LAYERS 有权

公开(公告)号：US20170186420A1

公开(公告)日：2017-06-29

申请号：US15454407

申请日：2017-03-09

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/16 , G10L15/14 , G10L15/02

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/08 , G10L15/12 , G10L15/142 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

10.

发明授权
Multisensory speech detection 有权
Title translation: 多感觉语音检测

公开(公告)号：US09570094B2

公开(公告)日：2017-02-14

申请号：US14753904

申请日：2015-06-29

Applicant: Google Inc.

Inventor： Dave Burke , Michael J. LeBeau , Konrad Gianno , Trausti T. Kristjansson , John Nicholas Jitkoff , Andrew W. Senior

IPC: G10L15/04 , G10L25/78 , G10L15/10 , G06F3/0346 , H04M1/725 , H04R1/08 , H04W4/02 , G10L17/00 , G06F3/16

CPC classification number: G10L25/78 , G06F3/0346 , G06F3/167 , G10L15/10 , G10L15/22 , G10L15/265 , G10L17/00 , G10L25/21 , H04M1/72569 , H04M2250/12 , H04M2250/74 , H04R1/08 , H04W4/026

Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

Abstract translation: 公开了一种计算机实现的多感觉语音检测方法。该方法包括基于移动设备的方向来确定移动设备的方位并确定移动设备的操作模式。该方法还包括识别基于所确定的操作模式来指定语音检测何时开始或结束的语音检测参数，以及基于语音检测参数来检测来自移动设备的用户的语音。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification