Patent search ap:("Google Inc.") AND inv:"Hasim Sak" Page 2

11.

发明申请
MULTI-ACCENT SPEECH RECOGNITION 审中-公开

公开(公告)号：US20180053500A1

公开(公告)日：2018-02-22

申请号：US15243838

申请日：2016-08-22

Applicant: Google Inc.

Inventor： Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/16 , G10L15/02 , G10L15/06

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/187 , G10L25/30 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a hierarchical recurrent neural network (HRNN) having a plurality of parameters on a plurality of training acoustic sequences to generate phoneme representations of received acoustic sequences. One method includes, for each of the received training acoustic sequences: processing the received acoustic sequence in accordance with current values of the parameters of the HRNN to generate a predicted grapheme representation of the received acoustic sequence; processing an intermediate output generated by an intermediate layer of the HRNN during the processing of the received acoustic sequence to generate one or more predicted phoneme representations of the received acoustic sequence; and adjusting the current values of the parameters of the HRNN based at (i) the predicted grapheme representation and (ii) the one or more predicted phoneme representations.

12.

发明申请
GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES 审中-公开

公开(公告)号：US20170330558A1

公开(公告)日：2017-11-16

申请号：US15664153

申请日：2017-07-31

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/16 , G10L15/02 , G10L15/14

CPC classification number: G10L15/16 , G10L15/02 , G10L15/142 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

13.

发明授权
Generating acoustic models 有权

公开(公告)号：US09786270B2

公开(公告)日：2017-10-10

申请号：US15205263

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/06 , G10L15/16 , G10L15/187

CPC classification number: G10L15/063 , G10L15/16 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

14.

发明授权
Processing acoustic sequences using long short-term memory (LSTM) neural networks that include recurrent projection layers 有权

公开(公告)号：US09620108B2

公开(公告)日：2017-04-11

申请号：US14557725

申请日：2014-12-02

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/16 , G06N3/02 , G10L15/08 , G10L15/12 , G10L15/02

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/08 , G10L15/12 , G10L15/142 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

15.

发明申请
GENERATING ACOUSTIC MODELS 有权
Title translation: 生成声学模型

公开(公告)号：US20170011738A1

公开(公告)日：2017-01-12

申请号：US15205263

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/06 , G10L21/06 , G10L15/34 , G10L15/16 , G10L15/26

CPC classification number: G10L15/063 , G10L15/16 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于产生声学模型的计算机程序。在一些实现中，获得了使用连接时间分类算法训练为声学模型的第一神经网络。获得来自第一神经网络的输出分布用于发音。第二神经网络被训练为使用由第一神经网络产生的输出分布作为第二神经网络的输出目标的声学模型。提供了一种被配置为使用训练有素的第二神经网络的自动语音识别器。

16.

发明申请
LEARNING PRONUNCIATIONS FROM ACOUSTIC SEQUENCES 审中-公开
Title translation: 从声学序列学习发明

公开(公告)号：US20160351188A1

公开(公告)日：2016-12-01

申请号：US14811939

申请日：2015-07-29

Applicant: Google Inc.

Inventor： Kanury Kanishka Rao , Francoise Beaufays , Hasim Sak , Ouais Alsharif

IPC: G10L15/187 , G10L15/05 , G10L15/16 , G06F17/27

CPC classification number: G10L15/187 , G06N3/0445 , G06N3/084 , G10L15/063 , G10L15/16 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the time steps processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output for the time step using a phoneme output layer to generate a phoneme representation for the acoustic feature representation for the time step; and processing the recurrent output for the time step using a grapheme output layer to generate a grapheme representation for the acoustic feature representation for the time step; and extracting, from the phoneme and grapheme representations for the acoustic feature representations at each time step, a respective pronunciation for each of one or more words.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于从声学序列学习发音的计算机程序。一种方法包括接收声学序列，所述声学序列包括在多个时间步长中的每一个处的相应声学特征表示; 对于通过一个或多个循环神经网络层中的每一个处理声学特征表示的每个时间步骤，以产生反复输出; 使用音素输出层处理时间步长的复现输出，以产生用于时间步长的声学特征表示的音素表示; 以及使用字形输出层处理所述时间步长的复现输出，以生成用于所述时间步长的声学特征表示的图形表示; 并且从每个时间步长处的声音特征表示的音素和图形表示中提取一个或多个单词中的每一个的相应发音。

17.

发明授权
Sub-lexical language models with word level pronunciation lexicons 有权
Title translation: 具有词级发音词典的子词汇语言模型

公开(公告)号：US09292489B1

公开(公告)日：2016-03-22

申请号：US13855893

申请日：2013-04-03

Applicant: Google Inc.

Inventor： Hasim Sak , Murat Saraclar

IPC: G10L15/08 , G10L15/22 , G10L15/183 , G06F17/27 , G10L13/08 , G10L13/10 , G10L15/00 , G10L15/26

CPC classification number: G06F17/2785 , G06F17/2775 , G10L13/08 , G10L13/10 , G10L15/00 , G10L15/183 , G10L15/197 , G10L15/26

Abstract: An automatic speech recognition (ASR) system and method are provided for using sub-lexical language models together with word level pronunciation lexicons. These approaches operate by introducing a transduction between sequences of sub-lexical units and sequences of words.

Abstract translation: 提供了自动语音识别（ASR）系统和方法，用于将词汇语言模型与单词级别的发音词典一起使用。这些方法通过引入子词汇单元序列和单词序列之间的转导来进行操作。

18.

发明申请
GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES 有权
Title translation: 产生声调序列的代表

公开(公告)号：US20150170640A1

公开(公告)日：2015-06-18

申请号：US14559113

申请日：2014-12-03

Applicant: Google Inc.

Inventor： Hasim Sak , Andrew W. Senior

IPC: G10L15/16 , G10L15/187

CPC classification number: G10L15/16 , G10L15/02 , G10L15/142 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于产生声学序列的表示。方法之一包括：接收声学序列，声学序列包括在多个时间步长中的每一个处的相应的声学特征表示; 使用声学建模神经网络在初始时间步骤处理声学特征表示; 对于所述多个时间步骤中的每个随后的时间步长：接收由所述声学建模神经网络生成的用于前一时间步长的输出，从由所述声学建模神经网络为前一时间步长产生的输出产生修改的输入，并且所述声学用于时间步长的表示，以及使用声学建模神经网络处理经修改的输入以产生时间步长的输出; 以及从每个时间步长的输出中产生用于发声的音素表示。

19.

发明申请
LATENCY CONSTRAINTS FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20170103752A1

公开(公告)日：2017-04-13

申请号：US14879225

申请日：2015-10-09

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/16 , G06N3/04 , G06N3/08

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic modeling of audio data. One method includes receiving audio data representing a portion of an utterance, providing the audio data to a trained recurrent neural network that has been trained to indicate the occurrence of a phone at any of multiple time frames within a maximum delay of receiving audio data corresponding to the phone, receiving, within the predetermined maximum delay of providing the audio data to the trained recurrent neural network, output of the trained neural network indicating a phone corresponding to the provided audio data using output of the trained neural network to determine a transcription for the utterance, and providing the transcription for the utterance.

20.

发明申请
CONTEXT-DEPENDENT MODELING OF PHONEMES 有权
Title translation: 语音相关依赖建模

公开(公告)号：US20160372118A1

公开(公告)日：2016-12-22

申请号：US14877673

申请日：2015-10-07

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Izhak Shafran

IPC: G10L17/18 , G10L17/04 , G10L17/14

CPC classification number: G10L17/14 , G06N3/0445 , G10L15/02 , G10L15/16 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

Abstract translation: 方法，系统和装置，包括在用于建模音素的计算机存储介质上编码的计算机程序。一种方法包括：在多个时间步骤的每个步骤处接收声学序列，表示话语的声学序列，以及包括相应的声学特征表示的声学序列; 对于所述多个时间步骤中的每个步骤：通过一个或多个循环神经网络层中的每一个处理所述声学特征表示以产生复现输出; 使用softmax输出层处理复现输出以产生一组分数，该分数集合包括多个上下文相关词汇表音素中的每一个的相应分数，每个上下文相关词汇语音的分数表示上下文相关的可能性词汇音素代表时间步长的话语; 以及从所述多个时间步长的得分确定所述序列的上下文相关音素表示。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification