UTTERANCE CLASSIFIER
    11.
    发明申请

    公开(公告)号:US20190035390A1

    公开(公告)日:2019-01-31

    申请号:US15659016

    申请日:2017-07-25

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for classification using neural networks. One method includes receiving audio data corresponding to an utterance. Obtaining a transcription of the utterance. Generating a representation of the audio data. Generating a representation of the transcription of the utterance. Providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of the audio data and a given representation of the transcription of the utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistance or is likely not directed to an automated assistant. Receiving, from the classifier, an indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant. Selectively instructing the automated assistant based at least on the indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant.

    Low-rank hidden input layer for speech recognition neural network

    公开(公告)号:US09646634B2

    公开(公告)日:2017-05-09

    申请号:US14616881

    申请日:2015-02-09

    Applicant: Google Inc.

    CPC classification number: G10L25/30 G06N3/0454 G06N3/0481 G10L15/063

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods for training a deep neural network that includes a low rank hidden input layer and an adjoining hidden layer, the low rank hidden input layer including a first matrix A and a second matrix B with dimensions i×m and m×o, respectively, to identify a keyword includes receiving a feature vector including i values that represent features of an audio signal encoding an utterance, determining, using the low rank hidden input layer, an output vector including o values using the feature vector, determining, using the adjoining hidden layer, another vector using the output vector, determining a confidence score that indicates whether the utterance includes the keyword using the other vector, and adjusting weights for the low rank hidden input layer using the confidence score.

    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR
    13.
    发明申请
    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR 有权
    用户指定的关键字使用长时间记忆神经网络特征提取器

    公开(公告)号:US20170076717A1

    公开(公告)日:2017-03-16

    申请号:US15345982

    申请日:2016-11-08

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    Key phrase detection
    14.
    发明授权
    Key phrase detection 有权
    关键词检测

    公开(公告)号:US09202462B2

    公开(公告)日:2015-12-01

    申请号:US14041131

    申请日:2013-09-30

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于密钥短语检测的计算机程序。 其中一种方法包括接收多个音频帧向量,每个音频帧向量在不同的时间段内对音频波形进行建模,为每个音频帧向量生成输出特征向量,其中每个输出特征向量包括一组表征 相应的音频帧向量与一组预期事件向量之间的声匹配,每个预期事件向量对应于分数中的一个,并定义关键字的至少一部分的声学属性,并提供每个输出特征向量 到后处理模块。

    User specified keyword spotting using long short term memory neural network feature extractor
    15.
    发明授权
    User specified keyword spotting using long short term memory neural network feature extractor 有权
    用户指定关键词使用长期记忆神经网络特征提取器

    公开(公告)号:US09508340B2

    公开(公告)日:2016-11-29

    申请号:US14579603

    申请日:2014-12-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR
    16.
    发明申请
    USER SPECIFIED KEYWORD SPOTTING USING LONG SHORT TERM MEMORY NEURAL NETWORK FEATURE EXTRACTOR 有权
    用户指定的关键字使用长时间记忆神经网络特征提取器

    公开(公告)号:US20160180838A1

    公开(公告)日:2016-06-23

    申请号:US14579603

    申请日:2014-12-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用长的短期记忆神经网络来识别关键词。 方法之一包括通过设备为多个可变长度登记音频信号中的每一个接收代表相应可变长度登记音频信号的特征的相应多个登记特征向量,使用 长时间记忆(LSTM)神经网络,以为每个注册特征向量生成相应的注册LSTM输出向量,并且为相应的可变长度注册音频信号生成模板固定长度表示,用于确定另一个音频信号是否对其进行编码 通过组合用于登记音频信号的登记LSTM输出向量的数量k的最多数量来说明注册短语的说话话语。

    KEYWORD DETECTION BASED ON ACOUSTIC ALIGNMENT
    17.
    发明申请
    KEYWORD DETECTION BASED ON ACOUSTIC ALIGNMENT 审中-公开
    基于声学对准的关键词检测

    公开(公告)号:US20150279351A1

    公开(公告)日:2015-10-01

    申请号:US13861020

    申请日:2013-04-11

    Applicant: Google Inc.

    CPC classification number: G10L15/08 G10L15/02 G10L2015/088

    Abstract: Embodiments pertain to automatic speech recognition in mobile devices to establish the presence of a keyword. An audio waveform is received at a mobile device. Front-end feature extraction is performed on the audio waveform, followed by acoustic modeling, high level feature extraction, and output classification to detect the keyword. Acoustic modeling may use a neural network or Gaussian mixture modeling, and high level feature extraction may be done by aligning the results of the acoustic modeling with expected event vectors that correspond to a keyword.

    Abstract translation: 实施例涉及移动设备中的自动语音识别以建立关键字的存在。 在移动设备处接收音频波形。 对音频波形执行前端特征提取,然后进行声学建模,高级特征提取和输出分类,以检测关键字。 声学建模可以使用神经网络或高斯混合建模,并且可以通过将声学建模的结果与对应于关键字的预期事件向量对齐来完成高级特征提取。

Patent Agency Ranking