CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS
    21.
    发明申请
    CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS 审中-公开
    连续长时间的记忆,完全连接的深层神经网络

    公开(公告)号:US20160099010A1

    公开(公告)日:2016-04-07

    申请号:US14847133

    申请日:2015-09-08

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于识别口语发音的语言。 其中一种方法包括接收话音的输入特征; 以及使用包括一个或多个卷积神经网络(CNN)层,一个或多个长短期存储网络(LSTM)层和一个或多个完全连接的神经网络层的声学模型来处理输入特征,以产生用于 说话。

    Neural Networks for Speaker Verification
    22.
    发明申请

    公开(公告)号:US20190043508A1

    公开(公告)日:2019-02-07

    申请号:US15666806

    申请日:2017-08-02

    Applicant: Google Inc.

    Abstract: Systems, methods, devices, and other techniques for training and using a speaker verification neural network. A computing device may receive data that characterizes a first utterance. The computing device provides the data that characterizes the utterance to a speaker verification neural network. Subsequently, the computing device obtains, from the speaker verification neural network, a speaker representation that indicates speaking characteristics of a speaker of the first utterance. The computing device determines whether the first utterance is classified as an utterance of a registered user of the computing device. In response to determining that the first utterance is classified as an utterance of the registered user of the computing device, the device may perform an action for the registered user of the computing device.

    Language modeling of complete language sequences

    公开(公告)号:US09786269B2

    公开(公告)日:2017-10-10

    申请号:US13875406

    申请日:2013-05-02

    Applicant: Google Inc.

    CPC classification number: G10L15/063 G10L15/197

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.

    Generating representations of acoustic sequences

    公开(公告)号:US09721562B2

    公开(公告)日:2017-08-01

    申请号:US14559113

    申请日:2014-12-03

    Applicant: Google Inc.

    CPC classification number: G10L15/16 G10L15/02 G10L15/142 G10L2015/025

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

    Mixture of n-gram language models
    26.
    发明授权
    Mixture of n-gram language models 有权
    n-gram语言模型的混合

    公开(公告)号:US09208779B2

    公开(公告)日:2015-12-08

    申请号:US14019685

    申请日:2013-09-06

    Applicant: Google Inc.

    CPC classification number: G10L15/197 G10L15/063 G10L2015/0631

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models GM, determining a set of n-gram language model weights λM based on the development sentences W and the set of language models GM, determining a set of sentence cluster weights γC, each of the sentence cluster weights corresponding to a cluster in a set of sentence clusters, each cluster in the set of sentence clusters associated with at least one sentence from the set of development sentences W, and generating a language model from the set of language models GM, the set of n-gram language model weights λM, the set of sentence clusters, and the set of sentence cluster weights γC.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于从混合的n-gram语言模型创建静态语言模型。 一种方法包括接收一组开发句子W,接收一组语言模型GM,基于开发句子W和语言模型GM集合确定一组n语言模型权重λM,确定一组 语句集群权重γC,每个句子集合权重对应于一组语句集群中的一个集群,每组集群中的句子集合与来自该组开发语句W的至少一个句子相关联,并且从 语言模型GM集合,n-gram语言模型权重集合λM,句子集合集合以及句子集群权重集合γC。

    GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES USING PROJECTION LAYERS
    27.
    发明申请
    GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES USING PROJECTION LAYERS 有权
    使用投影层产生声学序列的表示

    公开(公告)号:US20150161991A1

    公开(公告)日:2015-06-11

    申请号:US14557725

    申请日:2014-12-02

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用投影序列产生声学序列的音素表示。 方法之一包括接收声学序列,代表发音的声学序列,以及包括在多个时间步长中的每一个处的各个声学特征表示的声学序列; 对于所述多个时间步骤中的每个步骤,通过一个或多个长短期存储器(LSTM)层中的每一个处理所述声学特征表示; 并且对于多个时间步骤中的每一个,使用输出层处理由时间步长的最高LSTM层产生的复现投影输出,以生成用于该时间步长的一组分数。

    MIXTURE OF N-GRAM LANGUAGE MODELS
    28.
    发明申请
    MIXTURE OF N-GRAM LANGUAGE MODELS 有权
    N-GRAM语言模型的混合

    公开(公告)号:US20150073788A1

    公开(公告)日:2015-03-12

    申请号:US14019685

    申请日:2013-09-06

    Applicant: Google Inc.

    CPC classification number: G10L15/197 G10L15/063 G10L2015/0631

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models GM, determining a set of n-gram language model weights λM based on the development sentences W and the set of language models GM, determining a set of sentence cluster weights γC, each of the sentence cluster weights corresponding to a cluster in a set of sentence clusters, each cluster in the set of sentence clusters associated with at least one sentence from the set of development sentences W, and generating a language model from the set of language models GM, the set of n-gram language model weights λM, the set of sentence clusters, and the set of sentence cluster weights γC.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于从混合的n-gram语言模型创建静态语言模型。 一种方法包括接收一组开发句子W,接收一组语言模型GM,基于开发句子W和语言模型GM集合确定一组n语言模型权重λM,确定一组 语句集群权重γC,每个句子集合权重对应于一组语句集群中的一个集群,每组集群中的句子集合与来自该组开发语句W的至少一个句子相关联,并且从 语言模型GM集合,n-gram语言模型权重集合λM,句子集合集合以及句子集群权重集合γC。

    SPEECH TRANSCRIPTION INCLUDING WRITTEN TEXT
    29.
    发明申请
    SPEECH TRANSCRIPTION INCLUDING WRITTEN TEXT 有权
    语音转换,包括书面文字

    公开(公告)号:US20140149119A1

    公开(公告)日:2014-05-29

    申请号:US13829482

    申请日:2013-03-14

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.

    Abstract translation: 公开了包括在计算机存储介质上编码的用于将话语转换成书面文本的计算机程序的方法,系统和装置。 方法,系统和装置包括获取将电话映射到口语文本并获得将概率分配给书写文本的语言模型的词典模型的动作。 还包括生成将书写文本映射到口语文本的传感器,换能器将多个文本文本项目映射到口语文本的项目。 此外,这些动作包括通过组合词典模型,换能器的倒数和语言模型来构建用于将话语转录成书写文本的解码网络。

Patent Agency Ranking