Multi-stage speaker adaptation
    1.
    发明授权
    Multi-stage speaker adaptation 有权
    多级扬声器适配

    公开(公告)号:US08996366B2

    公开(公告)日:2015-03-31

    申请号:US14181908

    申请日:2014-02-17

    Applicant: Google Inc.

    CPC classification number: G10L17/00 G10L15/065 G10L15/07

    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

    Abstract translation: 可以基于对应于第一输入语音单元的第一组特征向量的特征来选择第一个具体的性别的说话者自适应技术。 可以将第一组特征向量配置为用于第一输入语音单元的自动语音识别(ASR)。 可以基于第一性别特异性说话者适应技术来修改对应于第二输入语音单元的第二组特征向量。 经修改的第二组特征向量可以被配置为在第二输入语音单元的ASR中使用。 可以基于第二组特征向量的特征来选择第一说话者相关的说话者自适应技术。 可以基于第一说话者相关的说话人适应技术来修改对应于第三单位语音的第三组特征向量。

    Multi-stage speaker adaptation
    2.
    发明授权

    公开(公告)号:US08571859B1

    公开(公告)日:2013-10-29

    申请号:US13653792

    申请日:2012-10-17

    Applicant: Google Inc.

    CPC classification number: G10L17/00 G10L15/065 G10L15/07

    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

    Online incremental adaptation of deep neural networks using auxiliary Gaussian mixture models in speech recognition
    3.
    发明授权
    Online incremental adaptation of deep neural networks using auxiliary Gaussian mixture models in speech recognition 有权
    在语音识别中使用辅助高斯混合模型的深层神经网络的在线增量适应

    公开(公告)号:US09466292B1

    公开(公告)日:2016-10-11

    申请号:US13886620

    申请日:2013-05-03

    Applicant: Google Inc.

    CPC classification number: G10L15/16 G10L15/07 G10L15/14

    Abstract: Methods and systems for online incremental adaptation of neural networks using Gaussian mixture models in speech recognition are described. In an example, a computing device may be configured to receive an audio signal and a subsequent audio signal, both signals having speech content. The computing device may be configured to apply a speaker-specific feature transform to the audio signal to obtain a transformed audio signal. The speaker-specific feature transform may be configured to include speaker-specific speech characteristics of a speaker-profile relating to the speech content. Further, the computing device may be configured to process the transformed audio signal using a neural network trained to estimate a respective speech content of the audio signal. Based on outputs of the neural network, the computing device may be configured to modify the speaker-specific feature transform, and apply the modified speaker-specific feature transform to a subsequent audio signal.

    Abstract translation: 描述了在语音识别中使用高斯混合模型的神经网络在线增量适应的方法和系统。 在一个示例中,计算设备可以被配置为接收具有语音内容的两个信号的音频信号和后续音频信号。 计算设备可以被配置为将音频特征变换应用于音频信号以获得经变换的音频信号。 特定于扬声器的特征变换可以被配置为包括与语音内容相关的扬声器简档的特定于说话者的语音特征。 此外,计算设备可以被配置为使用被训练来估计音频信号的相应语音内容的神经网络来处理变换的音频信号。 基于所述神经网络的输出,所述计算装置可以被配置为修改所述特定于扬声器的特征变换,并且将所述修改的说话者专有特征变换应用于后续音频信号。

    Multi-Stage Speaker Adaptation
    4.
    发明申请
    Multi-Stage Speaker Adaptation 有权
    多级扬声器适应

    公开(公告)号:US20140025378A1

    公开(公告)日:2014-01-23

    申请号:US14035499

    申请日:2013-09-24

    Applicant: Google Inc.

    CPC classification number: G10L17/00 G10L15/065 G10L15/07

    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

    Abstract translation: 可以基于对应于第一输入语音单元的第一组特征向量的特征来选择第一个具体的性别的说话者自适应技术。 可以将第一组特征向量配置为用于第一输入语音单元的自动语音识别(ASR)。 可以基于第一性别特异性说话者适应技术来修改对应于第二输入语音单元的第二组特征向量。 经修改的第二组特征向量可以被配置为在第二输入语音单元的ASR中使用。 可以基于第二组特征向量的特征来选择第一说话者相关的说话者自适应技术。 可以基于第一说话者相关的说话人适应技术来修改对应于第三单位语音的第三组特征向量。

    Speech recognition parameter adjustment
    5.
    发明授权
    Speech recognition parameter adjustment 有权
    语音识别参数调整

    公开(公告)号:US08600746B1

    公开(公告)日:2013-12-03

    申请号:US13649747

    申请日:2012-10-11

    Applicant: Google Inc.

    CPC classification number: G10L15/22 G10L15/30 G10L2015/226

    Abstract: Audio data that encodes an utterance of a user is received. It is determined that the user has been classified as a novice user of a speech recognizer. A speech recognizer setting is selected that is used by the speech recognizer in generating a transcription of the utterance. The selected speech recognizer setting is different than a default speech recognizer setting that is used by the speech recognizer in generating transcriptions of utterances of users that are not classified as novice users. The selected speech recognizer setting results in increased speech recognition accuracy in comparison with the default setting. A transcription of the utterance is obtained that is generated by the speech recognizer using the selected setting.

    Abstract translation: 接收到编码用户话语的音频数据。 确定用户已经被分类为语音识别器的新手用户。 选择语音识别器设置,其由语音识别器用于产生话语的转录。 所选择的语音识别器设置不同于语音识别器在生成未分类为新手用户的用户的话语的转录中使用的默认语音识别器设置。 与默认设置相比,所选择的语音识别器设置导致语音识别精度提高。 获得由语音识别器使用所选择的设置产生的话语的转录。

    Speaker verification using neural networks
    6.
    发明授权
    Speaker verification using neural networks 有权
    使用神经网络的扬声器验证

    公开(公告)号:US09401148B2

    公开(公告)日:2016-07-26

    申请号:US14228469

    申请日:2014-03-28

    Applicant: Google Inc.

    CPC classification number: G10L17/18

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于将对应于特定话语的语音数据输入到神经网络; 基于所述神经网络的隐藏层的输出确定评估向量; 将评估向量与对应于特定说话者的过去发音的参考向量进行比较; 并且基于比较评估向量和参考向量,确定特定发音是否可能由特定说话者说出。

    Localized speech recognition with offload
    7.
    发明授权
    Localized speech recognition with offload 有权
    本地语音识别与卸载

    公开(公告)号:US08880398B1

    公开(公告)日:2014-11-04

    申请号:US13746039

    申请日:2013-01-21

    Applicant: Google Inc.

    CPC classification number: G10L21/00 G10L15/07 G10L15/30 G10L2015/223

    Abstract: A local computing device may receive an utterance from a user device. In response to receiving the utterance, the local computing device may obtain a text string transcription of the utterance, and determine a response mode for the utterance. If the response mode is a text-based mode, the local computing device may provide the text string transcription to a target device. If the response mode is a non-text-based mode, the local computing device may convert the text string transcription into one or more commands from a command set supported by the target device, and provide the one or more commands to the target device.

    Abstract translation: 本地计算设备可以从用户设备接收话语。 响应于接收到话语,本地计算设备可以获得话音的文本串转录,并且确定话语的响应模式。 如果响应模式是基于文本的模式,则本地计算设备可以将文本串转录提供给目标设备。 如果响应模式是非基于文本的模式,则本地计算设备可以将文本串转录转换为来自目标设备支持的命令集的一个或多个命令,并将一个或多个命令提供给目标设备。

    Multi-Stage Speaker Adaptation
    8.
    发明申请

    公开(公告)号:US20140163985A1

    公开(公告)日:2014-06-12

    申请号:US14181908

    申请日:2014-02-17

    Applicant: Google Inc.

    CPC classification number: G10L17/00 G10L15/065 G10L15/07

    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

    Localized speech recognition with offload
    9.
    发明授权
    Localized speech recognition with offload 有权
    本地语音识别与卸载

    公开(公告)号:US08554559B1

    公开(公告)日:2013-10-08

    申请号:US13746115

    申请日:2013-01-21

    Applicant: Google Inc.

    CPC classification number: G10L21/00 G10L15/07 G10L15/30 G10L2015/223

    Abstract: A local computing device may receive an utterance from a user device. In response to receiving the utterance, the local computing device may obtain a text string transcription of the utterance, and determine a response mode for the utterance. If the response mode is a text-based mode, the local computing device may provide the text string transcription to a target device. If the response mode is a non-text-based mode, the local computing device may convert the text string transcription into one or more commands from a command set supported by the target device, and provide the one or more commands to the target device.

    Abstract translation: 本地计算设备可以从用户设备接收话音。 响应于接收到话语,本地计算设备可以获得话音的文本串转录,并且确定话语的响应模式。 如果响应模式是基于文本的模式,则本地计算设备可以将文本串转录提供给目标设备。 如果响应模式是非基于文本的模式,则本地计算设备可以将文本串转录转换为来自目标设备支持的命令集的一个或多个命令,并将一个或多个命令提供给目标设备。

    Realtime acoustic adaptation using stability measures
    10.
    发明授权
    Realtime acoustic adaptation using stability measures 有权
    使用稳定性措施实时声学适应

    公开(公告)号:US08515750B1

    公开(公告)日:2013-08-20

    申请号:US13622576

    申请日:2012-09-19

    Applicant: Google Inc.

    CPC classification number: G10L17/14 G10L15/07 G10L15/26

    Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.

    Abstract translation: 公开了在计算机存储介质上编码的用于使用稳定性度量的实时声学适应的方法,系统和计算机程序。 所述方法包括接收语音会话的第一部分的转录的动作,其中使用说话者适配简档生成语音会话的第一部分的转录。 所述动作还包括接收转录片段的稳定性度量,并确定片段的稳定性度量满足阈值。 此外,动作包括使用该段触发对说话者适配简档的更新,或者使用对应于片段的语音数据的一部分。 并且所述动作包括接收所述语音会话的第二部分的转录,其中使用所述更新的说话者适应简档来生成所述语音会话的所述第二部分的转录。

Patent Agency Ranking