-
公开(公告)号:US09779724B2
公开(公告)日:2017-10-03
申请号:US14532208
申请日:2014-11-04
Applicant: Google Inc.
Inventor: Alexander H. Gruenstein , Dave Harwath , Ian C. McGraw
CPC classification number: G10L15/08 , G10L2015/221
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting alternates in speech recognition. In some implementations, data is received that indicates multiple speech recognition hypotheses for an utterance. Based on the multiple speech recognition hypotheses, multiple alternates for a particular portion of a transcription of the utterance are identified. For each of the identified alternates, one or more features scores are determined, the features scores are input to a trained classifier, and an output is received from the classifier. A subset of the identified alternates is selected, based on the classifier outputs, to provide for display. Data indicating the selected subset of the alternates is provided for display.
-
公开(公告)号:US20160351199A1
公开(公告)日:2016-12-01
申请号:US15233090
申请日:2016-08-10
Applicant: Google Inc.
CPC classification number: G10L17/10 , G10L15/12 , G10L15/16 , G10L15/20 , G10L15/22 , G10L17/00 , G10L17/02 , G10L17/18 , G10L17/24 , G10L25/30 , G10L2015/088 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.
-
3.
公开(公告)号:US20150340032A1
公开(公告)日:2015-11-26
申请号:US14285801
申请日:2014-05-23
Applicant: Google Inc.
Inventor: Alexander H. Gruenstein
IPC: G10L15/16
CPC classification number: G06N3/08 , G06N3/0454 , G06N3/0472 , G10L15/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability vector for a first feature vector using a first neural network, determining whether one of the scores in the first posterior probability vector satisfies a first threshold value, generating a second posterior probability vector for each subsequent feature vector using a second neural network, wherein the second neural network is trained to identify the same key words and key phrases and includes more inner layer nodes than the first neural network, and determining whether one of the scores in the second posterior probability vector satisfies a second threshold value.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练深层神经网络。 其中一种方法包括生成多个特征向量,每个特征向量建模音频波形的不同部分,使用第一神经网络为第一特征向量生成第一后验概率向量,确定第一后验概率中的分数之一 向量满足第一阈值,使用第二神经网络为每个后续特征向量生成第二后验概率向量,其中训练第二神经网络以识别相同的关键词和关键短语,并且包括比第一神经网络更多的内层节点 网络,并且确定第二后验概率向量中的分数之一是否满足第二阈值。
-
公开(公告)号:US20150095027A1
公开(公告)日:2015-04-02
申请号:US14041131
申请日:2013-09-30
Applicant: Google Inc.
CPC classification number: G10L15/063 , G06N3/0454 , G06N7/005 , G10L15/02 , G10L15/08 , G10L15/10 , G10L15/14 , G10L15/16 , G10L17/18 , G10L2015/0631 , G10L2015/088
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于密钥短语检测的计算机程序。 其中一种方法包括接收多个音频帧向量,每个音频帧向量在不同的时间段内对音频波形进行建模,为每个音频帧向量生成输出特征向量,其中每个输出特征向量包括表征的一组分数 相应的音频帧向量与一组预期事件向量之间的声匹配,每个预期事件向量对应于分数中的一个,并定义关键字的至少一部分的声学属性,并提供每个输出特征向量 到后处理模块。
-
公开(公告)号:US09418656B2
公开(公告)日:2016-08-16
申请号:US14657588
申请日:2015-03-13
Applicant: Google Inc.
CPC classification number: G10L17/10 , G10L15/12 , G10L15/16 , G10L15/20 , G10L15/22 , G10L17/00 , G10L17/02 , G10L17/18 , G10L17/24 , G10L25/30 , G10L2015/088 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.
Abstract translation: 公开了包括在计算机存储介质上编码的用于多级热门检测的计算机程序的方法,系统和装置。 一方面,一种方法包括由至少包括第一级热词检测器和第二级热词检测器的多级热词检测系统的第二级热词检测器接收与初始部分对应的音频数据的动作, 一个话语。 所述动作还包括确定所述话语的初始部分包括热词的可能性。 所述动作还包括确定所述话语的初始部分包括所述热门词的所述可能性满足阈值。 响应于确定似然率满足阈值,动作进一步包括:发送对第一级热词检测器的请求,以停止提供对应于话语的一个或多个后续部分的附加音频数据。
-
公开(公告)号:US20160104483A1
公开(公告)日:2016-04-14
申请号:US14659861
申请日:2015-03-17
Applicant: Google Inc.
Inventor: Jakob Nicolaus Foerster , Alexander H. Gruenstein
CPC classification number: G10L15/22 , G10L15/02 , G10L15/08 , G10L15/265 , G10L15/30 , G10L25/03 , G10L25/78 , G10L2015/088 , G10L2015/223 , G10L2025/783
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a computing device, audio data that corresponds to an utterance. The actions further include determining a likelihood that the utterance includes a hotword. The actions further include determining a loudness score for the audio data. The actions further include based on the loudness score, determining an amount of delay time. The actions further include, after the amount of delay time has elapsed, transmitting a signal that indicates that the computing device will initiate speech recognition processing on the audio data.
Abstract translation: 公开了包括在计算机存储介质上编码的计算机程序,用于在多个设备上进行热词检测的方法,系统和装置。 一方面,一种方法包括由计算设备接收对应于话语的音频数据的动作。 动作进一步包括确定话语包括一个热门词的可能性。 动作还包括确定音频数据的响度得分。 动作进一步包括基于响度得分,确定延迟时间量。 动作进一步包括在经过延迟时间的量之后,发送指示计算设备将对音频数据发起语音识别处理的信号。
-
公开(公告)号:US20140229185A1
公开(公告)日:2014-08-14
申请号:US14252913
申请日:2014-04-15
Applicant: Google Inc.
Inventor: William J. Byrne , Alexander H. Gruenstein , Douglas H. Beeferman
IPC: G10L15/06
CPC classification number: G10L15/22 , G06F3/04842 , G06F3/167 , G06F17/2795 , G10L15/02 , G10L15/063 , G10L15/1822 , G10L15/26 , G10L2015/0631 , G10L2015/0635 , G10L2015/0638 , G10L2015/221
Abstract: Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases.
Abstract translation: 基于自由形式语音输入,预测和学习用户对电子设备的预期动作。 可以监视用户的动作以开发具有与运营商短语对应的一个或多个动作的运营商短语的列表。 用户可以向设备发出命令以启动动作。 可以解析口头命令并将其与载体短语列表进行比较。 如果口头命令与已知的运营商短语之一匹配,则可以将相应的动作呈现给用户进行选择。 如果口头命令与已知的运营商短语之一不匹配,则可以向用户呈现与口语命令对应的搜索结果(例如,因特网搜索结果)。 可以监视用户响应于所呈现的动作和/或搜索结果的动作以更新运营商短语列表。
-
公开(公告)号:US20180330728A1
公开(公告)日:2018-11-15
申请号:US15593278
申请日:2017-05-11
Applicant: Google Inc.
Inventor: Alexander H. Gruenstein , Aleksandar Kracun , Matthew Sharifi
CPC classification number: G10L15/22 , G06F17/30026 , G10L15/08 , G10L15/26 , G10L17/005 , G10L2015/0636 , G10L2015/088 , G10L2015/223 , H04L63/1425 , H04L63/1458
Abstract: A computing system receives requests from client devices to process voice queries that have been detected in local environments of the client devices. The system identifies that a value that is based on a number of requests to process voice queries received by the system during a specified time interval satisfies one or more criteria. In response, the system triggers analysis of at least some of the requests received during the specified time interval to trigger analysis of at least some received requests to determine a set of requests that each identify a common voice query. The system can generate an electronic fingerprint that indicates a distinctive model of the common voice query. The fingerprint can then be used to detect an illegitimate voice query identified in a request from a client device at a later time.
-
公开(公告)号:US20180061419A1
公开(公告)日:2018-03-01
申请号:US15278269
申请日:2016-09-28
Applicant: Google Inc.
CPC classification number: G10L15/30 , G10L15/16 , G10L15/22 , G10L25/78 , G10L2015/088 , G10L2015/223 , H04L67/10
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance. The actions further include determining that the utterance likely includes a particular, predefined hotword. The actions further include transmitting (i) data indicating that the computing device likely received the particular, predefined hotword, (ii) data identifying the computing device, and (iii) data identifying a group of nearby computing devices that includes the computing device. The actions further include receiving an instruction to commence speech recognition processing on the audio data. The actions further include in response to receiving the instruction to commence speech recognition processing on the audio data, processing at least a portion of the audio data using an automated speech recognizer on the computing device.
-
10.
公开(公告)号:US09484022B2
公开(公告)日:2016-11-01
申请号:US14285801
申请日:2014-05-23
Applicant: Google Inc.
Inventor: Alexander H. Gruenstein
IPC: G10L15/16
CPC classification number: G06N3/08 , G06N3/0454 , G06N3/0472 , G10L15/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability vector for a first feature vector using a first neural network, determining whether one of the scores in the first posterior probability vector satisfies a first threshold value, generating a second posterior probability vector for each subsequent feature vector using a second neural network, wherein the second neural network is trained to identify the same key words and key phrases and includes more inner layer nodes than the first neural network, and determining whether one of the scores in the second posterior probability vector satisfies a second threshold value.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练深层神经网络。 其中一种方法包括生成多个特征向量,每个特征向量建模音频波形的不同部分,使用第一神经网络为第一特征向量生成第一后验概率向量,确定第一后验概率中的分数之一 向量满足第一阈值,使用第二神经网络为每个后续特征向量生成第二后验概率向量,其中训练第二神经网络以识别相同的关键词和关键短语,并且包括比第一神经网络更多的内层节点 网络,并且确定第二后验概率向量中的分数之一是否满足第二阈值。
-
-
-
-
-
-
-
-
-