Transfer learning for deep neural network based hotword detection

    公开(公告)号:US09715660B2

    公开(公告)日:2017-07-25

    申请号:US14230225

    申请日:2014-03-31

    Applicant: Google Inc.

    CPC classification number: G06N7/005 G06N3/0454 G10L15/16 G10L2015/088

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.

    User specified keyword spotting using long short term memory neural network feature extractor
    3.
    发明授权
    User specified keyword spotting using long short term memory neural network feature extractor 有权
    用户指定关键词使用长期记忆神经网络特征提取器

    公开(公告)号:US09508340B2

    公开(公告)日:2016-11-29

    申请号:US14579603

    申请日:2014-12-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR
    4.
    发明申请
    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR 有权
    用户指定的关键字使用长时间记忆神经网络特征提取器

    公开(公告)号:US20160180838A1

    公开(公告)日:2016-06-23

    申请号:US14579603

    申请日:2014-12-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    TRANSFER LEARNING FOR DEEP NEURAL NETWORK BASED HOTWORD DETECTION
    5.
    发明申请
    TRANSFER LEARNING FOR DEEP NEURAL NETWORK BASED HOTWORD DETECTION 有权
    基于深层神经网络的传输学习方法

    公开(公告)号:US20150127594A1

    公开(公告)日:2015-05-07

    申请号:US14230225

    申请日:2014-03-31

    Applicant: GOOGLE INC.

    CPC classification number: G06N7/005 G06N3/0454 G10L15/16 G10L2015/088

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练深层神经网络。 其中一种方法包括通过调整包含在神经网络中的多个权重中的每一个的值来训练具有第一训练集的深神经网络,以及训练深层神经网络以确定由深层神经网络接收的数据的概率 特征类似于一个或多个关键词或关键短语的关键特征,所述训练包括向所述深层神经网络提供第二训练集并且调整所述多个权重的第一子集的值,其中所述第二训练集包括表示 一个或多个关键字或关键短语的主要功能。

    KEY PHRASE DETECTION
    6.
    发明申请
    KEY PHRASE DETECTION 有权
    关键相位检测

    公开(公告)号:US20150095027A1

    公开(公告)日:2015-04-02

    申请号:US14041131

    申请日:2013-09-30

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于密钥短语检测的计算机程序。 其中一种方法包括接收多个音频帧向量,每个音频帧向量在不同的时间段内对音频波形进行建模,为每个音频帧向量生成输出特征向量,其中每个输出特征向量包括表征的一组分数 相应的音频帧向量与一组预期事件向量之间的声匹配,每个预期事件向量对应于分数中的一个,并定义关键字的至少一部分的声学属性,并提供每个输出特征向量 到后处理模块。

    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR
    7.
    发明申请
    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR 有权
    用户指定的关键字使用长时间记忆神经网络特征提取器

    公开(公告)号:US20170076717A1

    公开(公告)日:2017-03-16

    申请号:US15345982

    申请日:2016-11-08

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    Key phrase detection
    8.
    发明授权
    Key phrase detection 有权
    关键词检测

    公开(公告)号:US09202462B2

    公开(公告)日:2015-12-01

    申请号:US14041131

    申请日:2013-09-30

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于密钥短语检测的计算机程序。 其中一种方法包括接收多个音频帧向量,每个音频帧向量在不同的时间段内对音频波形进行建模,为每个音频帧向量生成输出特征向量,其中每个输出特征向量包括一组表征 相应的音频帧向量与一组预期事件向量之间的声匹配,每个预期事件向量对应于分数中的一个,并定义关键字的至少一部分的声学属性,并提供每个输出特征向量 到后处理模块。

Patent Agency Ranking