Patent search ap:("Google Inc.") AND inv:"Andrew W. Senior" Page 2

11.

发明授权
Deep networks for unit selection speech synthesis 有权
Title translation: 深层网络单元选择语音合成

公开(公告)号：US09460704B2

公开(公告)日：2016-10-04

申请号：US14019967

申请日：2013-09-06

Applicant: Google Inc.

Inventor： Andrew W. Senior , Javier Gonzalvo Fructuoso

IPC: G10L13/00 , G10L13/06 , G10L25/30

CPC classification number: G10L13/06 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于基于资源中的结构化数据提供表示。方法，系统和装置包括接收从神经网络输出的目标声学特征的动作，所述神经网络已被训练以预测具有语言特征的声学特征。附加动作包括确定目标声学特征与存储的声学样本的声学特征之间的距离。进一步的动作包括至少基于所确定的距离来选择要在语音合成中使用的声学样本，并且基于所选择的声学样本来合成语音。

12.

发明授权
Cluster specific speech model 有权
Title translation: 集群特定语音模型

公开(公告)号：US09401143B2

公开(公告)日：2016-07-26

申请号：US14663610

申请日：2015-03-20

Applicant: Google Inc.

Inventor： Andrew W. Senior , Ignacio Lopez Moreno

IPC: G10L15/06 , G10L21/00 , G10L15/183

CPC classification number: G10L15/063 , G10L15/183 , G10L2015/0631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, where each cluster includes a plurality of vectors, and where each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于接收表示用户声音的声学特性的数据; 从多个聚类中选择用于数据的聚类，其中每个聚类包括多个向量，并且其中每个聚类与使用所述多个向量的至少一个或多个向量的由神经网络训练的语音模型相关联各集群; 并且响应于接收到所述用户的一个或多个话语，提供与所述群集相关联的语音模型以用于转录所述一个或多个话语。

13.

发明申请
GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES USING PROJECTION LAYERS 有权
Title translation: 使用投影层产生声学序列的表示

公开(公告)号：US20150161991A1

公开(公告)日：2015-06-11

申请号：US14557725

申请日：2014-12-02

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/08

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/08 , G10L15/12 , G10L15/142 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于使用投影序列产生声学序列的音素表示。方法之一包括接收声学序列，代表发音的声学序列，以及包括在多个时间步长中的每一个处的各个声学特征表示的声学序列; 对于所述多个时间步骤中的每个步骤，通过一个或多个长短期存储器（LSTM）层中的每一个处理所述声学特征表示; 并且对于多个时间步骤中的每一个，使用输出层处理由时间步长的最高LSTM层产生的复现投影输出，以生成用于该时间步长的一组分数。

14.

发明授权
Speech recognition with acoustic models 有权

公开(公告)号：US09818410B2

公开(公告)日：2017-11-14

申请号：US14983315

申请日：2015-12-29

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/00 , G10L17/00 , G10L15/14 , G10L13/00 , G10L17/14 , G10L15/02 , G10L15/16

CPC classification number: G10L17/14 , G06N3/0445 , G10L15/02 , G10L15/16 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

15.

发明申请
SPEECH RECOGNITION WITH ACOUSTIC MODELS 有权
Title translation: 用声学模型进行语音识别

公开(公告)号：US20160372119A1

公开(公告)日：2016-12-22

申请号：US14983315

申请日：2015-12-29

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L17/18 , G10L17/02 , G10L17/04

CPC classification number: G10L17/14 , G06N3/0445 , G10L15/02 , G10L15/16 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: sub sampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于从声学序列学习发音的计算机程序。一种方法包括：在多个时间步长中的每个步骤处接收声学序列，代表发音的声学序列，以及包括多个声学数据帧序列的声学序列; 堆叠一个或多个声音数据帧以产生声学数据的修改帧序列; 通过包括一个或多个循环神经网络（RNN）层和最终CTC输出层的声学建模神经网络来处理声学数据的经修改的帧序列以产生神经网络输出，其中处理声学数据的经修改的帧序列包括：对声学数据的修改帧进行子采样; 并通过声学建模神经网络处理每个子采样的声学数据的修改帧。

16.

发明申请
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
Title translation: 处理多通道音频波形

公开(公告)号：US20160322055A1

公开(公告)日：2016-11-03

申请号：US15205321

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani

IPC: G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Abstract translation: 方法，包括在计算机存储介质上编码的计算机程序，用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。一方面，一种方法包括：接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出，其中多个滤波器具有在训练过程期间已经学习的参数，其共同训练多个滤波器并训练深度神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

17.

发明申请
MULTILINGUAL PROSODY GENERATION 有权

公开(公告)号：US20160071512A1

公开(公告)日：2016-03-10

申请号：US14942300

申请日：2015-11-16

Applicant: Google Inc.

Inventor： Javier Gonzalvo Fructuoso , Andrew W. Senior , Byungha Chun

IPC: G10L13/10 , G10L13/07 , G10L25/30 , G10L13/08

CPC classification number: G10L13/10 , G06F17/289 , G10L13/07 , G10L13/08 , G10L13/086 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

18.

发明申请
CACHING SPEECH RECOGNITION SCORES 有权
Title translation: 缓存语音识别码

公开(公告)号：US20150371631A1

公开(公告)日：2015-12-24

申请号：US14311557

申请日：2014-06-23

Applicant: Google Inc.

Inventor： Eugene Weinstein , Sanjiv Kumar , Ignacio L. Moreno , Andrew W. Senior , Nikhil Prasad Bhat

IPC: G10L15/14 , G10L19/038

CPC classification number: G10L15/08 , G10L15/285

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score.

Abstract translation: 方法，系统和装置，包括编码在计算机存储介质上的用于缓存语音识别分数的计算机程序。在一些实现中，接收包括关于话语的数据的一个或多个值。确定一个或多个值的索引值。基于索引值，从接收到一个或多个值之前计算的声学模型分数的高速缓存中选择一个或多个接收值的声学模型分数。使用所选择的声学模型得分确定发音的转录。

19.

发明授权
Multilingual prosody generation 有权
Title translation: 多语言韵律一代

公开(公告)号：US09195656B2

公开(公告)日：2015-11-24

申请号：US14143627

申请日：2013-12-30

Applicant: Google Inc.

Inventor： Javier Gonzalvo Fructuoso , Andrew W. Senior , Byungha Chun

IPC: G10L13/08 , G06F17/28 , G10L13/10

CPC classification number: G10L13/10 , G06F17/289 , G10L13/07 , G10L13/08 , G10L13/086 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于多语言韵律生成。在一些实现中，获得指示与文本相对应的一组语言特征的数据。指示语言特征的数据和指示文本语言的数据被提供给已经被训练以提供指示多种语言的韵律信息的输出的神经网络的输入。神经网络可以是已经使用多种语言的语音训练的神经网络。从神经网络接收到表示语言特征的韵律信息的输出。使用神经网络的输出生成表示文本的音频数据。

20.

发明申请
Multisensory Speech Detection 有权
Title translation: 多感觉语音检测

公开(公告)号：US20150302870A1

公开(公告)日：2015-10-22

申请号：US14753904

申请日：2015-06-29

Applicant: Google Inc.

Inventor： Dave Burke , Michael J. LeBeau , Konrad Gianno , Trausti T. Kristjansson , John Nicholas Jitkoff , Andrew W. Senior

IPC: G10L25/78 , H04M1/725 , G10L17/00 , H04W4/02

CPC classification number: G10L25/78 , G06F3/0346 , G06F3/167 , G10L15/10 , G10L15/22 , G10L15/265 , G10L17/00 , G10L25/21 , H04M1/72569 , H04M2250/12 , H04M2250/74 , H04R1/08 , H04W4/026

Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

Abstract translation: 公开了一种计算机实现的多感觉语音检测方法。该方法包括基于移动设备的方向来确定移动设备的方位并确定移动设备的操作模式。该方法还包括识别基于所确定的操作模式来指定语音检测何时开始或结束的语音检测参数，以及基于语音检测参数来检测来自移动设备的用户的语音。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification