ACOUSTIC MODEL TRAINING CORPUS SELECTION

    公开(公告)号:US20160267903A1

    公开(公告)日:2016-09-15

    申请号:US15164263

    申请日:2016-05-25

    Applicant: Google Inc.

    Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer. An updated production speech recognizer component is provided to the production speech recognizer for use in transcribing subsequently received speech data items.

    ACOUSTIC MODEL TRAINING CORPUS SELECTION
    3.
    发明申请
    ACOUSTIC MODEL TRAINING CORPUS SELECTION 有权
    ACOUSTIC MODEL TRAINING CORPUS选择

    公开(公告)号:US20160093294A1

    公开(公告)日:2016-03-31

    申请号:US14693268

    申请日:2015-04-22

    Applicant: Google Inc.

    Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer. An updated production speech recognizer component is provided to the production speech recognizer for use in transcribing subsequently received speech data items.

    Abstract translation: 本公开涉及训练语音识别系统。 一个示例性方法包括接收语音数据项集合,其中每个语音数据项对应于先前由生产语音识别器提交用于转录的话语。 生产语音识别器使用初始生产语音识别器组件来产生语音数据项的转录。 使用离线语音识别器生成每个语音数据项的转录,并且将离线语音识别器组件配置为与初始制作语音识别器组件相比提高语音识别精度。 使用由离线语音识别器生成的语音数据项的转录的所选择的子集来对生产语音识别器进行更新的制作语音识别器组件的训练。 更新的生产语音识别器组件被提供给生产语音识别器,用于转录随后接收的语音数据项。

    SPEAKER IDENTIFICATION
    4.
    发明申请
    SPEAKER IDENTIFICATION 有权
    扬声器识别

    公开(公告)号:US20150127342A1

    公开(公告)日:2015-05-07

    申请号:US14523198

    申请日:2014-10-24

    Applicant: Google Inc.

    CPC classification number: G10L17/02 G10L17/005 G10L17/08 G10L17/18 G10L25/51

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, an utterance vector that is derived from an utterance is obtained. Hash values are determined for the utterance vector according to multiple different hash functions. A set of speaker vectors from a plurality of hash tables is determined using the hash values, where each speaker vector was derived from one or more utterances of a respective speaker. The speaker vectors in the set are compared with the utterance vector. A speaker vector is selected based on comparing the speaker vectors in the set with the utterance vector.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话人识别的计算机程序。 在一些实现中,获得从话语导出的话语向量。 根据多个不同的哈希函数为发声向量确定哈希值。 使用散列值来确定来自多个散列表的一组扬声器向量,其中每个扬声器向量是从相应说话者的一个或多个话语导出的。 将集合中的扬声器矢量与发声矢量进行比较。 基于将集合中的扬声器矢量与发声矢量进行比较来选择扬声器矢量。

    CONTEXTUAL HOTWORDS
    5.
    发明申请
    CONTEXTUAL HOTWORDS 审中-公开

    公开(公告)号:US20180182390A1

    公开(公告)日:2018-06-28

    申请号:US15391358

    申请日:2016-12-27

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.

    SPEAKER VERIFICATION
    6.
    发明申请

    公开(公告)号:US20180018973A1

    公开(公告)日:2018-01-18

    申请号:US15211317

    申请日:2016-07-15

    Applicant: Google Inc.

    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network produced in response to receiving the set of input data, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

    IDENTIFYING THE LANGUAGE OF A SPOKEN UTTERANCE
    8.
    发明申请
    IDENTIFYING THE LANGUAGE OF A SPOKEN UTTERANCE 审中-公开
    识别语言的语言

    公开(公告)号:US20160035344A1

    公开(公告)日:2016-02-04

    申请号:US14817302

    申请日:2015-08-04

    Applicant: Google Inc.

    CPC classification number: G10L15/005 G06N3/0445 G06N3/084 G10L15/16

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving a plurality of audio frames that collectively represent at least a portion of a spoken utterance; processing the plurality of audio frames using a long short term memory (LSTM) neural network to generate a respective language score for each of a plurality of languages, wherein the respective language score for each of the plurality of languages represents a likelihood that the spoken utterance was spoken in the language; and classifying the spoken utterance as being spoken in one of the plurality of languages using the language scores.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于识别口语发音的语言。 其中一种方法包括:接收多个音频帧,它们共同表示说出话语的至少一部分; 使用长的短期存储器(LSTM)神经网络来处理所述多个音频帧以针对多种语言中的每一种产生相应的语言得分,其中所述多种语言中的每一种的相应语言得分表示所述语音发音的可能性 用语言说; 并且使用语言分数将口语说话分类为以多种语言之一说出来。

    Speaker identification using hash-based indexing
    9.
    发明授权
    Speaker identification using hash-based indexing 有权
    扬声器识别使用基于散列的索引

    公开(公告)号:US09514753B2

    公开(公告)日:2016-12-06

    申请号:US14523198

    申请日:2014-10-24

    Applicant: Google Inc.

    CPC classification number: G10L17/02 G10L17/005 G10L17/08 G10L17/18 G10L25/51

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, an utterance vector that is derived from an utterance is obtained. Hash values are determined for the utterance vector according to multiple different hash functions. A set of speaker vectors from a plurality of hash tables is determined using the hash values, where each speaker vector was derived from one or more utterances of a respective speaker. The speaker vectors in the set are compared with the utterance vector. A speaker vector is selected based on comparing the speaker vectors in the set with the utterance vector.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话人识别的计算机程序。 在一些实现中,获得从话语导出的话语向量。 根据多个不同的哈希函数为发声向量确定哈希值。 使用散列值来确定来自多个散列表的一组扬声器向量,其中每个扬声器向量是从相应说话者的一个或多个话语导出的。 将集合中的扬声器矢量与发声矢量进行比较。 基于将集合中的扬声器矢量与发声矢量进行比较来选择扬声器矢量。

    Acoustic model training corpus selection

    公开(公告)号:US09472187B2

    公开(公告)日:2016-10-18

    申请号:US15164263

    申请日:2016-05-25

    Applicant: Google Inc.

    Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer. An updated production speech recognizer component is provided to the production speech recognizer for use in transcribing subsequently received speech data items.

Patent Agency Ranking