-
公开(公告)号:US09293136B2
公开(公告)日:2016-03-22
申请号:US14726943
申请日:2015-06-01
Applicant: Google Inc.
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar , Fadi Biadsy
CPC classification number: G10L15/26 , G10L15/01 , G10L15/197 , G10L15/30 , G10L15/32 , H04M2250/74
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
-
公开(公告)号:US20150262581A1
公开(公告)日:2015-09-17
申请号:US14726943
申请日:2015-06-01
Applicant: Google Inc.
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar , Fadi Biadsy
IPC: G10L15/26
CPC classification number: G10L15/26 , G10L15/01 , G10L15/197 , G10L15/30 , G10L15/32 , H04M2250/74
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
Abstract translation: 本说明书的主题可以包括接收对应于话语的音频数据的方法,获得使用有限语音识别器产生的话语的第一次转录。 有限语音识别器包括语音识别器,该语音识别器包括一个语言模型,该语言模型通过有限的语音识别词汇训练,该语义识别词汇包括来自语音命令语法的一个或多个术语,但是包括扩展语法的全部术语。 获得了使用扩展语音识别器生成的话语的第二个转录。 扩展语音识别器包括语音识别器,其包括在包括扩展语法的所有术语的扩展语音识别词汇训练的语言模型。 该话语至少基于第一转录或第二转录的一部分进行分类。
-
公开(公告)号:US20160267904A1
公开(公告)日:2016-09-15
申请号:US14681652
申请日:2015-04-08
Applicant: Google Inc.
Inventor: Fadi Biadsy , Diamantino Antonio Caseiro
CPC classification number: G10L15/08 , G10L15/183 , G10L15/26 , G10L15/30
Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.
Abstract translation: 提供了用于解决模型中缺失特征的系统和方法。 在一些实现中,被配置为指示不同结果的可能性的模型被访问。 该模型包括针对多个特征中的每一个的相应分数,并且每个特征对应于相关联的上下文中的结果。 确定模型不包括与特定上下文中的潜在结果相对应的特征的分数。 基于对应于特定上下文中的不同结果的模型中的一个或多个特征的得分,确定特定上下文中的潜在结果的得分。 模型和得分用于确定潜在结果发生的可能性。
-
公开(公告)号:US09058805B2
公开(公告)日:2015-06-16
申请号:US13892590
申请日:2013-05-13
Applicant: Google Inc.
Inventor: Petar Aleksic , Pedro J. Mengibar , Fadi Biadsy
CPC classification number: G10L15/26 , G10L15/01 , G10L15/197 , G10L15/30 , G10L15/32 , H04M2250/74
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
Abstract translation: 本说明书的主题可以包括接收对应于话语的音频数据的方法,获得使用有限语音识别器产生的话语的第一次转录。 有限语音识别器包括语音识别器,该语音识别器包括一个语言模型,该语言模型通过有限的语音识别词汇训练,该语义识别词汇包括来自语音命令语法的一个或多个术语,但是包括扩展语法的全部术语。 获得了使用扩展语音识别器生成的话语的第二个转录。 扩展语音识别器包括语音识别器,其包括在包括扩展语法的所有术语的扩展语音识别词汇训练的语言模型。 该话语至少基于第一转录或第二转录的一部分进行分类。
-
公开(公告)号:US09026431B1
公开(公告)日:2015-05-05
申请号:US13953956
申请日:2013-07-30
Applicant: Google Inc.
Inventor: Pedro J. Moreno Mengibar , Diego Melendo Casado , Fadi Biadsy
CPC classification number: G06F17/2785 , G10L15/1815 , G10L15/1822 , G10L15/26 , G10L15/30 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for semantic parsing with multiple parsers. One of the methods includes obtaining one or more transcribed prompt n-grams from a speech to text recognizer, providing the transcribed prompt n-grams to a first semantic parser that executes on the user device and accesses a first knowledge base for results responsive to the spoken prompt, providing the transcribed prompt n-grams to a second semantic parser that accesses a second knowledge base for results responsive to the spoken prompt, the first knowledge base including first data not included in the second knowledge base, receiving a result responsive to the spoken prompt from the first semantic parser or the second semantic parser, wherein the result is selected from the knowledge base associated with the semantic parser that provided the result to the user device, and performing an operation based on the result.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于与多个解析器进行语义解析。 其中一种方法包括从语音到文本识别器获得一个或多个转录的提示n-gram,将转录的提示n-gram提供给在用户设备上执行的第一语义解析器,并且访问第一知识库以响应于 将所述转录提示n-gram提供给第二语义解析器,所述第二语义解析器访问响应于所述语音提示的结果的第二知识库,所述第一知识库包括未包括在所述第二知识库中的第一数据,接收响应于所述第二知识库的结果 来自第一语义解析器或第二语义解析器的语音提示,其中从与向用户设备提供结果的语义解析器相关联的知识库中选择结果,并且基于该结果执行操作。
-
公开(公告)号:US20150006169A1
公开(公告)日:2015-01-01
申请号:US13930185
申请日:2013-06-28
Applicant: Google Inc.
Inventor: Fadi Biadsy , Pedro J. Moreno Mengibar
IPC: G10L15/26
CPC classification number: G10L15/22 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating expressions associated with voice commands. The methods, systems, and apparatus include actions of obtaining segments of one or more expressions associated with a voice command. Further actions include combining the segments into a candidate expression and scoring the candidate expression using a text corpus. Additional actions include selecting the candidate expression as an expression associated with the voice command based on the scoring of the candidate expression.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于产生与语音命令相关联的表达。 方法,系统和装置包括获得与语音命令相关联的一个或多个表达段的动作。 进一步的行动包括将片段组合成候选表达,并使用文本语料库对候选表达式进行评分。 附加动作包括基于候选表达式的评分将候选表达式选择为与语音命令相关联的表达。
-
公开(公告)号:US20150269934A1
公开(公告)日:2015-09-24
申请号:US14667518
申请日:2015-03-24
Applicant: Google Inc.
Inventor: Fadi Biadsy , Brian E. Roark
CPC classification number: G10L15/197
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to enhanced maximum entropy models. In some implementations, data indicating a candidate transcription for an utterance and a particular context for the utterance are received. A maximum entropy language model is obtained. Feature values are determined for n-gram features and backoff features of the maximum entropy language model. The feature values are input to the maximum entropy language model, and an output is received from the maximum entropy language model. A transcription for the utterance is selected from among a plurality of candidate transcriptions based on the output from the maximum entropy language model. The selected transcription is provided to a client device.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,涉及增强的最大熵模型。 在一些实施方式中,接收表示用于话语的候选转录和用于话语的特定上下文的数据。 获得最大熵语言模型。 为最大熵语言模型的n-gram特征和退避特征确定特征值。 特征值被输入到最大熵语言模型,并且从最大熵语言模型接收输出。 基于最大熵语言模型的输出,从多个候选转录中选择用于发音的转录。 选择的转录被提供给客户端设备。
-
公开(公告)号:US20150228279A1
公开(公告)日:2015-08-13
申请号:US14179257
申请日:2014-02-12
Applicant: Google Inc.
Inventor: Fadi Biadsy , Pedro J. Moreno Mengibar
CPC classification number: G10L15/26 , G10L15/08 , G10L15/197 , G10L2015/226 , G10L2015/228
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using non-linguistic context. In some implementations, context data indicating non-linguistic context for the utterance is received. Based on the context data, feature scores for one or more non-linguistic features are generated. The feature scores for the non-linguistic features are provided to a language model trained to process scores for non-linguistic features. The output from the language model is received, and a transcription for the utterance is determined using the output of the language model.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于使用非语言上下文的语言模型。 在一些实现中,接收到表示用于说话的非语言上下文的上下文数据。 基于上下文数据,生成一个或多个非语言特征的特征得分。 非语言特征的特征得分被提供给训练成处理非语言特征得分的语言模型。 接收到语言模型的输出,并且使用语言模型的输出来确定话语的转录。
-
公开(公告)号:US20140337032A1
公开(公告)日:2014-11-13
申请号:US13892590
申请日:2013-05-13
Applicant: Google Inc.
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar , Fadi Biadsy
IPC: G10L15/01
CPC classification number: G10L15/26 , G10L15/01 , G10L15/197 , G10L15/30 , G10L15/32 , H04M2250/74
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
Abstract translation: 本说明书的主题可以包括接收对应于话语的音频数据的方法,获得使用有限语音识别器产生的话语的第一次转录。 有限语音识别器包括语音识别器,该语音识别器包括一个语言模型,该语言模型通过有限的语音识别词汇训练,该语义识别词汇包括来自语音命令语法的一个或多个术语,但是包括扩展语法的全部术语。 获得了使用扩展语音识别器生成的话语的第二个转录。 扩展语音识别器包括语音识别器,其包括在包括扩展语法的所有术语的扩展语音识别词汇训练的语言模型。 该话语至少基于第一转录或第二转录的一部分进行分类。
-
公开(公告)号:US08868409B1
公开(公告)日:2014-10-21
申请号:US14157020
申请日:2014-01-16
Applicant: Google Inc.
Inventor: Pedro J. Moreno Mengibar , Fadi Biadsy , Diego Melendo Casado
CPC classification number: G10L15/26 , G06F17/2785 , G10L15/30
Abstract: In some implementations, audio data for an utterance is provided over a network. At a client device and over the network, information is received that indicates candidate transcriptions for the utterance and semantic information for the candidate transcriptions. A semantic parser is used at the client device to evaluate each of at least a plurality of the candidate transcriptions. One of the candidate transcriptions is selected based on at least the received semantic information and the output of the semantic parser for the plurality of candidate transcriptions that are evaluated.
Abstract translation: 在一些实现中,通过网络提供用于话语的音频数据。 在客户端设备和网络上,接收到指示用于候选转录的话语和语义信息的候选转录的信息。 在客户端设备处使用语义解析器来评估至少多个候选转录中的每一个。 基于所评估的多个候选转录的至少所接收的语义信息和语义解析器的输出来选择候选转录之一。
-
-
-
-
-
-
-
-
-