Multiple recognizer speech recognition

    公开(公告)号:US09293136B2

    公开(公告)日:2016-03-22

    申请号:US14726943

    申请日:2015-06-01

    Applicant: Google Inc.

    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

    Multiple Recognizer Speech Recognition
    22.
    发明申请
    Multiple Recognizer Speech Recognition 有权
    多重识别语音识别

    公开(公告)号:US20150262581A1

    公开(公告)日:2015-09-17

    申请号:US14726943

    申请日:2015-06-01

    Applicant: Google Inc.

    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

    Abstract translation: 本说明书的主题可以包括接收对应于话语的音频数据的方法,获得使用有限语音识别器产生的话语的第一次转录。 有限语音识别器包括语音识别器,该语音识别器包括一个语言模型,该语言模型通过有限的语音识别词汇训练,该语义识别词汇包括来自语音命令语法的一个或多个术语,但是包括扩展语法的全部术语。 获得了使用扩展语音识别器生成的话语的第二个转录。 扩展语音识别器包括语音识别器,其包括在包括扩展语法的所有术语的扩展语音识别词汇训练的语言模型。 该话语至少基于第一转录或第二转录的一部分进行分类。

    Localized speech recognition with offload
    23.
    发明授权
    Localized speech recognition with offload 有权
    本地语音识别与卸载

    公开(公告)号:US08880398B1

    公开(公告)日:2014-11-04

    申请号:US13746039

    申请日:2013-01-21

    Applicant: Google Inc.

    CPC classification number: G10L21/00 G10L15/07 G10L15/30 G10L2015/223

    Abstract: A local computing device may receive an utterance from a user device. In response to receiving the utterance, the local computing device may obtain a text string transcription of the utterance, and determine a response mode for the utterance. If the response mode is a text-based mode, the local computing device may provide the text string transcription to a target device. If the response mode is a non-text-based mode, the local computing device may convert the text string transcription into one or more commands from a command set supported by the target device, and provide the one or more commands to the target device.

    Abstract translation: 本地计算设备可以从用户设备接收话语。 响应于接收到话语,本地计算设备可以获得话音的文本串转录,并且确定话语的响应模式。 如果响应模式是基于文本的模式,则本地计算设备可以将文本串转录提供给目标设备。 如果响应模式是非基于文本的模式,则本地计算设备可以将文本串转录转换为来自目标设备支持的命令集的一个或多个命令,并将一个或多个命令提供给目标设备。

    Multi-Stage Speaker Adaptation
    24.
    发明申请

    公开(公告)号:US20140163985A1

    公开(公告)日:2014-06-12

    申请号:US14181908

    申请日:2014-02-17

    Applicant: Google Inc.

    CPC classification number: G10L17/00 G10L15/065 G10L15/07

    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

    MIXED MODEL SPEECH RECOGNITION
    25.
    发明申请
    MIXED MODEL SPEECH RECOGNITION 审中-公开
    混合模式语音识别

    公开(公告)号:US20130346078A1

    公开(公告)日:2013-12-26

    申请号:US13838379

    申请日:2013-03-15

    Applicant: Google Inc.

    Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

    Abstract translation: 一方面,一种方法包括基于来自用户的音频输入访问由计算设备生成的音频数据,该音频数据编码一个或多个用户话语。 该方法还包括通过使用基于用户特定数据采用语言模型的第一语音识别器对音频数据执行语音识别来产生话语的第一转录。 该方法还包括通过使用独立于用户特定数据的语言模型的第二语音识别器对音频数据执行语音识别来产生语音的第二转录。 该方法还包括确定话语的第二转录包括来自一个或多个术语的预定义集合的术语。 所述方法还包括:基于确定所述话语的第二转录包括所述术语,提供所述话语的第一转录的输出。

    Localized speech recognition with offload
    26.
    发明授权
    Localized speech recognition with offload 有权
    本地语音识别与卸载

    公开(公告)号:US08554559B1

    公开(公告)日:2013-10-08

    申请号:US13746115

    申请日:2013-01-21

    Applicant: Google Inc.

    CPC classification number: G10L21/00 G10L15/07 G10L15/30 G10L2015/223

    Abstract: A local computing device may receive an utterance from a user device. In response to receiving the utterance, the local computing device may obtain a text string transcription of the utterance, and determine a response mode for the utterance. If the response mode is a text-based mode, the local computing device may provide the text string transcription to a target device. If the response mode is a non-text-based mode, the local computing device may convert the text string transcription into one or more commands from a command set supported by the target device, and provide the one or more commands to the target device.

    Abstract translation: 本地计算设备可以从用户设备接收话音。 响应于接收到话语,本地计算设备可以获得话音的文本串转录,并且确定话语的响应模式。 如果响应模式是基于文本的模式,则本地计算设备可以将文本串转录提供给目标设备。 如果响应模式是非基于文本的模式,则本地计算设备可以将文本串转录转换为来自目标设备支持的命令集的一个或多个命令,并将一个或多个命令提供给目标设备。

    Realtime acoustic adaptation using stability measures
    27.
    发明授权
    Realtime acoustic adaptation using stability measures 有权
    使用稳定性措施实时声学适应

    公开(公告)号:US08515750B1

    公开(公告)日:2013-08-20

    申请号:US13622576

    申请日:2012-09-19

    Applicant: Google Inc.

    CPC classification number: G10L17/14 G10L15/07 G10L15/26

    Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.

    Abstract translation: 公开了在计算机存储介质上编码的用于使用稳定性度量的实时声学适应的方法,系统和计算机程序。 所述方法包括接收语音会话的第一部分的转录的动作,其中使用说话者适配简档生成语音会话的第一部分的转录。 所述动作还包括接收转录片段的稳定性度量,并确定片段的稳定性度量满足阈值。 此外,动作包括使用该段触发对说话者适配简档的更新,或者使用对应于片段的语音数据的一部分。 并且所述动作包括接收所述语音会话的第二部分的转录,其中使用所述更新的说话者适应简档来生成所述语音会话的所述第二部分的转录。

    Voice to text conversion based on third-party agent content

    公开(公告)号:US10600418B2

    公开(公告)日:2020-03-24

    申请号:US15372188

    申请日:2016-12-07

    Applicant: Google Inc.

    Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

    VOICE TO TEXT CONVERSION BASED ON THIRD-PARTY AGENT CONTENT

    公开(公告)号:US20190122657A1

    公开(公告)日:2019-04-25

    申请号:US15372188

    申请日:2016-12-07

    Applicant: Google Inc.

    Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

    Online incremental adaptation of deep neural networks using auxiliary Gaussian mixture models in speech recognition
    30.
    发明授权
    Online incremental adaptation of deep neural networks using auxiliary Gaussian mixture models in speech recognition 有权
    在语音识别中使用辅助高斯混合模型的深层神经网络的在线增量适应

    公开(公告)号:US09466292B1

    公开(公告)日:2016-10-11

    申请号:US13886620

    申请日:2013-05-03

    Applicant: Google Inc.

    CPC classification number: G10L15/16 G10L15/07 G10L15/14

    Abstract: Methods and systems for online incremental adaptation of neural networks using Gaussian mixture models in speech recognition are described. In an example, a computing device may be configured to receive an audio signal and a subsequent audio signal, both signals having speech content. The computing device may be configured to apply a speaker-specific feature transform to the audio signal to obtain a transformed audio signal. The speaker-specific feature transform may be configured to include speaker-specific speech characteristics of a speaker-profile relating to the speech content. Further, the computing device may be configured to process the transformed audio signal using a neural network trained to estimate a respective speech content of the audio signal. Based on outputs of the neural network, the computing device may be configured to modify the speaker-specific feature transform, and apply the modified speaker-specific feature transform to a subsequent audio signal.

    Abstract translation: 描述了在语音识别中使用高斯混合模型的神经网络在线增量适应的方法和系统。 在一个示例中,计算设备可以被配置为接收具有语音内容的两个信号的音频信号和后续音频信号。 计算设备可以被配置为将音频特征变换应用于音频信号以获得经变换的音频信号。 特定于扬声器的特征变换可以被配置为包括与语音内容相关的扬声器简档的特定于说话者的语音特征。 此外,计算设备可以被配置为使用被训练来估计音频信号的相应语音内容的神经网络来处理变换的音频信号。 基于所述神经网络的输出,所述计算装置可以被配置为修改所述特定于扬声器的特征变换,并且将所述修改的说话者专有特征变换应用于后续音频信号。

Patent Agency Ranking