-
公开(公告)号:US09058805B2
公开(公告)日:2015-06-16
申请号:US13892590
申请日:2013-05-13
Applicant: Google Inc.
Inventor: Petar Aleksic , Pedro J. Mengibar , Fadi Biadsy
CPC classification number: G10L15/26 , G10L15/01 , G10L15/197 , G10L15/30 , G10L15/32 , H04M2250/74
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
Abstract translation: 本说明书的主题可以包括接收对应于话语的音频数据的方法,获得使用有限语音识别器产生的话语的第一次转录。 有限语音识别器包括语音识别器,该语音识别器包括一个语言模型,该语言模型通过有限的语音识别词汇训练,该语义识别词汇包括来自语音命令语法的一个或多个术语,但是包括扩展语法的全部术语。 获得了使用扩展语音识别器生成的话语的第二个转录。 扩展语音识别器包括语音识别器,其包括在包括扩展语法的所有术语的扩展语音识别词汇训练的语言模型。 该话语至少基于第一转录或第二转录的一部分进行分类。
-
公开(公告)号:US20140025378A1
公开(公告)日:2014-01-23
申请号:US14035499
申请日:2013-09-24
Applicant: Google Inc.
Inventor: Petar Aleksic , Xin Lei
IPC: G10L17/00
CPC classification number: G10L17/00 , G10L15/065 , G10L15/07
Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.
Abstract translation: 可以基于对应于第一输入语音单元的第一组特征向量的特征来选择第一个具体的性别的说话者自适应技术。 可以将第一组特征向量配置为用于第一输入语音单元的自动语音识别(ASR)。 可以基于第一性别特异性说话者适应技术来修改对应于第二输入语音单元的第二组特征向量。 经修改的第二组特征向量可以被配置为在第二输入语音单元的ASR中使用。 可以基于第二组特征向量的特征来选择第一说话者相关的说话者自适应技术。 可以基于第一说话者相关的说话人适应技术来修改对应于第三单位语音的第三组特征向量。
-
公开(公告)号:US20180366112A1
公开(公告)日:2018-12-20
申请号:US15681801
申请日:2017-08-21
Applicant: Google Inc.
Inventor: Petar Aleksic , Michael D. Riley , Pedro J. Moreno Mengibar , Leonid Velikovich
IPC: G10L15/18 , G10L15/22 , G10L15/197 , G10L15/14
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for tagging during speech recognition. A word lattice that indicates probabilities for sequences of words in an utterance is obtained. A conditional probability transducer that indicates a frequency that sequences of both the words and semantic tags for the words appear is obtained. The word lattice and the conditional probability transducer are composed to construct a word lattice that indicates probabilities for sequences of both the words in the utterance and the semantic tags for the words. The word lattice that indicates probabilities for sequences of both the words in the utterance and the semantic tags for the words is used to generate a transcription that includes the words in the utterance and the semantic tags for the words.
-
公开(公告)号:US20180233150A1
公开(公告)日:2018-08-16
申请号:US15432358
申请日:2017-02-14
Applicant: Google Inc.
Inventor: Alexander H. Gruenstein , Petar Aleksic , Johan Schalkwyk , Pedro J. Moreno Mengibar
CPC classification number: G10L15/30 , G10L15/183 , G10L15/265 , G10L15/32 , G10L2015/088 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.
-
公开(公告)号:US20170358297A1
公开(公告)日:2017-12-14
申请号:US15396045
申请日:2016-12-30
Applicant: Google Inc.
Inventor: Justin Max Scheiner , Petar Aleksic
IPC: G10L15/197 , G06F17/30 , G10L15/22 , G10L15/30
CPC classification number: G10L15/197 , G06F17/2775 , G06F17/30684 , G06F17/30743 , G10L15/1815 , G10L15/22 , G10L15/30 , G10L2015/223
Abstract: This document generally describes systems and methods for dynamically adapting speech recognition for individual voice queries of a user using class-based language models. The method may include receiving a voice query from a user that includes audio data corresponding to an utterance of the user, and context data associated with the user. One or more class models are then generated that collectively identify a first set of terms determined based on the context data, and a respective class to which the respective term is assigned for each respective term in the first set of terms. A language model that includes a residual unigram may then be accessed and processed for each respective class to insert a respective class symbol at each instance of the residual unigram that occurs within the language model. A transcription of the utterance of the user is then generated using the modified language model.
-
公开(公告)号:US09691380B2
公开(公告)日:2017-06-27
申请号:US14739287
申请日:2015-06-15
Applicant: Google Inc.
Inventor: Pedro J. Moreno Mengibar , Petar Aleksic
IPC: G06F17/28 , G10L15/197 , G10L15/22 , G10L15/01
CPC classification number: G10L15/197 , G10L15/01 , G10L2015/228
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score.
-
公开(公告)号:US20140337032A1
公开(公告)日:2014-11-13
申请号:US13892590
申请日:2013-05-13
Applicant: Google Inc.
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar , Fadi Biadsy
IPC: G10L15/01
CPC classification number: G10L15/26 , G10L15/01 , G10L15/197 , G10L15/30 , G10L15/32 , H04M2250/74
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
Abstract translation: 本说明书的主题可以包括接收对应于话语的音频数据的方法,获得使用有限语音识别器产生的话语的第一次转录。 有限语音识别器包括语音识别器,该语音识别器包括一个语言模型,该语言模型通过有限的语音识别词汇训练,该语义识别词汇包括来自语音命令语法的一个或多个术语,但是包括扩展语法的全部术语。 获得了使用扩展语音识别器生成的话语的第二个转录。 扩展语音识别器包括语音识别器,其包括在包括扩展语法的所有术语的扩展语音识别词汇训练的语言模型。 该话语至少基于第一转录或第二转录的一部分进行分类。
-
公开(公告)号:US08849664B1
公开(公告)日:2014-09-30
申请号:US13943320
申请日:2013-07-16
Applicant: Google Inc.
Inventor: Xin Lei , Petar Aleksic
Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.
Abstract translation: 公开了在计算机存储介质上编码的用于使用稳定性度量的实时声学适应的方法,系统和计算机程序。 所述方法包括接收语音会话的第一部分的转录的动作,其中使用说话者适配简档生成语音会话的第一部分的转录。 所述动作还包括接收转录片段的稳定性度量,并确定片段的稳定性度量满足阈值。 此外,动作包括使用该段触发对说话者适配简档的更新,或者使用对应于片段的语音数据的一部分。 并且所述动作包括接收所述语音会话的第二部分的转录,其中使用所述更新的说话者适应简档来生成所述语音会话的所述第二部分的转录。
-
公开(公告)号:US08805684B1
公开(公告)日:2014-08-12
申请号:US13653804
申请日:2012-10-17
Applicant: Google Inc.
Inventor: Petar Aleksic , Xin Lei
CPC classification number: G10L15/07
Abstract: Automatic speech recognition (ASR) may be performed on received utterances. The ASR may be performed by an ASR module of a computing device (e.g., a client device). The ASR may include: generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, and updating the feature-space speaker adaptation parameters based on the feature vectors. The transcriptions may be based, at least in part, on an acoustic model and the updated feature vectors. Updated speaker adaptation parameters may be received from another computing device and incorporated into the ASR module.
Abstract translation: 可以对接收的话语执行自动语音识别(ASR)。 ASR可以由计算设备(例如,客户端设备)的ASR模块执行。 ASR可以包括:基于话语产生特征向量,基于特征空间讲话者自适应参数更新特征向量,将话语转录成文本串,以及基于特征向量更新特征空间讲话者自适应参数。 转录可以至少部分地基于声学模型和更新的特征向量。 可以从另一个计算设备接收更新的扬声器适配参数并将其并入ASR模块。
-
公开(公告)号:US08700393B2
公开(公告)日:2014-04-15
申请号:US14035499
申请日:2013-09-24
Applicant: Google Inc.
Inventor: Petar Aleksic , Xin Lei
IPC: G10L15/00
CPC classification number: G10L17/00 , G10L15/065 , G10L15/07
Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.
Abstract translation: 可以基于对应于第一输入语音单元的第一组特征向量的特征来选择第一个具体的性别的说话者自适应技术。 可以将第一组特征向量配置为用于第一输入语音单元的自动语音识别(ASR)。 可以基于第一性别特异性说话者适应技术来修改对应于第二输入语音单元的第二组特征向量。 经修改的第二组特征向量可以被配置为在第二输入语音单元的ASR中使用。 可以基于第二组特征向量的特征来选择第一说话者相关的说话者自适应技术。 可以基于第一说话者相关的说话人适应技术来修改对应于第三单位语音的第三组特征向量。
-
-
-
-
-
-
-
-
-