SPEAKER VERIFICATION USING NEURAL NETWORKS
    11.
    发明申请
    SPEAKER VERIFICATION USING NEURAL NETWORKS 有权
    使用神经网络的扬声器验证

    公开(公告)号:US20150127336A1

    公开(公告)日:2015-05-07

    申请号:US14228469

    申请日:2014-03-28

    Applicant: Google Inc.

    CPC classification number: G10L17/18

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于将对应于特定话语的语音数据输入到神经网络; 基于所述神经网络的隐藏层的输出确定评估向量; 将评估向量与对应于特定说话者的过去发音的参考向量进行比较; 并且基于比较评估向量和参考向量,确定特定发音是否可能由特定说话者说出。

    VIDEO ANALYSIS BASED LANGUAGE MODEL ADAPTATION
    12.
    发明申请
    VIDEO ANALYSIS BASED LANGUAGE MODEL ADAPTATION 审中-公开
    基于视频分析的语言模式适应

    公开(公告)号:US20140379346A1

    公开(公告)日:2014-12-25

    申请号:US13923545

    申请日:2013-06-21

    Applicant: Google Inc.

    CPC classification number: G10L15/25 G06K9/00335 G06K9/726 G10L15/183 G10L15/24

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data obtained by a microphone of a wearable computing device, wherein the audio data encodes a user utterance, receiving image data obtained by a camera of the wearable computing device, identifying one or more image features based on the image data, identifying one or more concepts based on the one or more image features, selecting one or more terms associated with a language model used by a speech recognizer to generate transcriptions, adjusting one or more probabilities associated with the language model that correspond to one or more of the selected terms based on the relevance of one or more of the selected terms to the one or more concepts, and obtaining a transcription of the user utterance using the speech recognizer.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于接收由可穿戴计算装置的麦克风获得的音频数据,其中所述音频数据对用户发声进行编码,接收摄像机获得的可佩带的图像数据 计算设备,基于图像数据识别一个或多个图像特征,基于一个或多个图像特征识别一个或多个概念,选择与由语音识别器使用的语言模型相关联以产生转录的一个或多个词语,调整一个 或更多与根据一个或多个所选术语与所述一个或多个概念的相关性对应于一个或多个所选项的语言模型的概率,以及使用所述语音识别器获得所述用户话语的转录。

    DATA DRIVEN PRONUNCIATION LEARNING WITH CROWD SOURCING
    14.
    发明申请
    DATA DRIVEN PRONUNCIATION LEARNING WITH CROWD SOURCING 有权
    数据驱动公开学习与CROWD采购

    公开(公告)号:US20150006178A1

    公开(公告)日:2015-01-01

    申请号:US13930495

    申请日:2013-06-28

    Applicant: Google Inc.

    CPC classification number: G10L15/18 G09B17/006 G10L13/08 G10L15/06

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining pronunciations for particular terms. The methods, systems, and apparatus include actions of obtaining audio samples of speech corresponding to a particular term and obtaining candidate pronunciations for the particular term. Further actions include generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between of the candidate pronunciation and the audio sample. Additional actions include aggregating the scores for each candidate pronunciation and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于确定特定术语的发音。 方法,系统和装置包括获得与特定术语相对应的语音样本的动作,并获得特定术语的候选发音。 进一步的动作包括针对特定术语的每个候选发音和对应于特定术语的语音样本生成反映候选发音和音频样本之间的相似程度的分数。 附加动作包括聚合每个候选发音的分数,并且基于候选发音的聚合分数,将特定术语的一个或多个候选发音添加到发音词典。

    Realtime acoustic adaptation using stability measures
    15.
    发明授权
    Realtime acoustic adaptation using stability measures 有权
    使用稳定性措施实时声学适应

    公开(公告)号:US08849664B1

    公开(公告)日:2014-09-30

    申请号:US13943320

    申请日:2013-07-16

    Applicant: Google Inc.

    CPC classification number: G10L17/14 G10L15/07 G10L15/26

    Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.

    Abstract translation: 公开了在计算机存储介质上编码的用于使用稳定性度量的实时声学适应的方法,系统和计算机程序。 所述方法包括接收语音会话的第一部分的转录的动作,其中使用说话者适配简档生成语音会话的第一部分的转录。 所述动作还包括接收转录片段的稳定性度量,并确定片段的稳定性度量满足阈值。 此外,动作包括使用该段触发对说话者适配简档的更新,或者使用对应于片段的语音数据的一部分。 并且所述动作包括接收所述语音会话的第二部分的转录,其中使用所述更新的说话者适应简档来生成所述语音会话的所述第二部分的转录。

    Distributed speaker adaptation
    16.
    发明授权
    Distributed speaker adaptation 有权
    分布式扬声器适应

    公开(公告)号:US08805684B1

    公开(公告)日:2014-08-12

    申请号:US13653804

    申请日:2012-10-17

    Applicant: Google Inc.

    CPC classification number: G10L15/07

    Abstract: Automatic speech recognition (ASR) may be performed on received utterances. The ASR may be performed by an ASR module of a computing device (e.g., a client device). The ASR may include: generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, and updating the feature-space speaker adaptation parameters based on the feature vectors. The transcriptions may be based, at least in part, on an acoustic model and the updated feature vectors. Updated speaker adaptation parameters may be received from another computing device and incorporated into the ASR module.

    Abstract translation: 可以对接收的话语执行自动语音识别(ASR)。 ASR可以由计算设备(例如,客户端设备)的ASR模块执行。 ASR可以包括:基于话语产生特征向量,基于特征空间讲话者自适应参数更新特征向量,将话语转录成文本串,以及基于特征向量更新特征空间讲话者自适应参数。 转录可以至少部分地基于声学模型和更新的特征向量。 可以从另一个计算设备接收更新的扬声器适配参数并将其并入ASR模块。

    Multi-stage speaker adaptation
    17.
    发明授权
    Multi-stage speaker adaptation 有权
    多级扬声器适配

    公开(公告)号:US08700393B2

    公开(公告)日:2014-04-15

    申请号:US14035499

    申请日:2013-09-24

    Applicant: Google Inc.

    CPC classification number: G10L17/00 G10L15/065 G10L15/07

    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.

    Abstract translation: 可以基于对应于第一输入语音单元的第一组特征向量的特征来选择第一个具体的性别的说话者自适应技术。 可以将第一组特征向量配置为用于第一输入语音单元的自动语音识别(ASR)。 可以基于第一性别特异性说话者适应技术来修改对应于第二输入语音单元的第二组特征向量。 经修改的第二组特征向量可以被配置为在第二输入语音单元的ASR中使用。 可以基于第二组特征向量的特征来选择第一说话者相关的说话者自适应技术。 可以基于第一说话者相关的说话人适应技术来修改对应于第三单位语音的第三组特征向量。

Patent Agency Ranking