-
公开(公告)号:US20170186424A1
公开(公告)日:2017-06-29
申请号:US15460342
申请日:2017-03-16
Applicant: Google Inc.
IPC: G10L15/20 , G10L21/034 , G10L25/84
CPC classification number: G10L15/20 , G06F3/165 , G06F3/167 , G10L15/222 , G10L15/265 , G10L17/00 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/3005
Abstract: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
-
公开(公告)号:US20160267903A1
公开(公告)日:2016-09-15
申请号:US15164263
申请日:2016-05-25
Applicant: Google Inc.
Inventor: Olga Kapralova , John Paul Alex , Eugene Weinstein , Pedro J. Moreno Mengibar , Olivier Siohan , Ignacio Lopez Moreno
CPC classification number: G10L15/063 , G10L15/01 , G10L15/16 , G10L15/187 , G10L15/30 , G10L15/32 , G10L2015/0633
Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer. An updated production speech recognizer component is provided to the production speech recognizer for use in transcribing subsequently received speech data items.
-
3.
公开(公告)号:US20160093294A1
公开(公告)日:2016-03-31
申请号:US14693268
申请日:2015-04-22
Applicant: Google Inc.
Inventor: Olga Kapralova , John Paul Alex , Eugene Weinstein , Pedro J. Moreno Mengibar , Olivier Siohan , Ignacio Lopez Moreno
IPC: G10L15/06 , G10L15/187 , G10L15/26 , G10L25/30
CPC classification number: G10L15/063 , G10L15/01 , G10L15/16 , G10L15/187 , G10L15/30 , G10L15/32 , G10L2015/0633
Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer. An updated production speech recognizer component is provided to the production speech recognizer for use in transcribing subsequently received speech data items.
Abstract translation: 本公开涉及训练语音识别系统。 一个示例性方法包括接收语音数据项集合,其中每个语音数据项对应于先前由生产语音识别器提交用于转录的话语。 生产语音识别器使用初始生产语音识别器组件来产生语音数据项的转录。 使用离线语音识别器生成每个语音数据项的转录,并且将离线语音识别器组件配置为与初始制作语音识别器组件相比提高语音识别精度。 使用由离线语音识别器生成的语音数据项的转录的所选择的子集来对生产语音识别器进行更新的制作语音识别器组件的训练。 更新的生产语音识别器组件被提供给生产语音识别器,用于转录随后接收的语音数据项。
-
公开(公告)号:US20150127342A1
公开(公告)日:2015-05-07
申请号:US14523198
申请日:2014-10-24
Applicant: Google Inc.
Inventor: Matthew Sharifi , Ignacio Lopez Moreno , Ludwig Schmidt
CPC classification number: G10L17/02 , G10L17/005 , G10L17/08 , G10L17/18 , G10L25/51
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, an utterance vector that is derived from an utterance is obtained. Hash values are determined for the utterance vector according to multiple different hash functions. A set of speaker vectors from a plurality of hash tables is determined using the hash values, where each speaker vector was derived from one or more utterances of a respective speaker. The speaker vectors in the set are compared with the utterance vector. A speaker vector is selected based on comparing the speaker vectors in the set with the utterance vector.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话人识别的计算机程序。 在一些实现中,获得从话语导出的话语向量。 根据多个不同的哈希函数为发声向量确定哈希值。 使用散列值来确定来自多个散列表的一组扬声器向量,其中每个扬声器向量是从相应说话者的一个或多个话语导出的。 将集合中的扬声器矢量与发声矢量进行比较。 基于将集合中的扬声器矢量与发声矢量进行比较来选择扬声器矢量。
-
公开(公告)号:US20180182390A1
公开(公告)日:2018-06-28
申请号:US15391358
申请日:2016-12-27
Applicant: Google Inc.
Inventor: Christopher Thaddeus Hughes , Ignacio Lopez Moreno , Aleksandar Kracun
CPC classification number: G10L15/22 , G10L15/02 , G10L15/08 , G10L15/20 , G10L2015/088 , G10L2015/223 , G10L2015/226 , G10L2015/228
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.
-
公开(公告)号:US20180018973A1
公开(公告)日:2018-01-18
申请号:US15211317
申请日:2016-07-15
Applicant: Google Inc.
Inventor: Ignacio Lopez Moreno , Li Wan , Quan Wang
Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network produced in response to receiving the set of input data, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.
-
公开(公告)号:US20160225373A1
公开(公告)日:2016-08-04
申请号:US15093309
申请日:2016-04-07
Applicant: Google Inc.
CPC classification number: G10L15/20 , G06F3/165 , G06F3/167 , G10L15/222 , G10L15/265 , G10L17/00 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/3005
Abstract: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
-
公开(公告)号:US20160035344A1
公开(公告)日:2016-02-04
申请号:US14817302
申请日:2015-08-04
Applicant: Google Inc.
Inventor: Javier Gonzalez-Dominguez , Hasim Sak , Ignacio Lopez Moreno
IPC: G10L15/00
CPC classification number: G10L15/005 , G06N3/0445 , G06N3/084 , G10L15/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving a plurality of audio frames that collectively represent at least a portion of a spoken utterance; processing the plurality of audio frames using a long short term memory (LSTM) neural network to generate a respective language score for each of a plurality of languages, wherein the respective language score for each of the plurality of languages represents a likelihood that the spoken utterance was spoken in the language; and classifying the spoken utterance as being spoken in one of the plurality of languages using the language scores.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于识别口语发音的语言。 其中一种方法包括:接收多个音频帧,它们共同表示说出话语的至少一部分; 使用长的短期存储器(LSTM)神经网络来处理所述多个音频帧以针对多种语言中的每一种产生相应的语言得分,其中所述多种语言中的每一种的相应语言得分表示所述语音发音的可能性 用语言说; 并且使用语言分数将口语说话分类为以多种语言之一说出来。
-
公开(公告)号:US09514753B2
公开(公告)日:2016-12-06
申请号:US14523198
申请日:2014-10-24
Applicant: Google Inc.
Inventor: Matthew Sharifi , Ignacio Lopez Moreno , Ludwig Schmidt
CPC classification number: G10L17/02 , G10L17/005 , G10L17/08 , G10L17/18 , G10L25/51
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, an utterance vector that is derived from an utterance is obtained. Hash values are determined for the utterance vector according to multiple different hash functions. A set of speaker vectors from a plurality of hash tables is determined using the hash values, where each speaker vector was derived from one or more utterances of a respective speaker. The speaker vectors in the set are compared with the utterance vector. A speaker vector is selected based on comparing the speaker vectors in the set with the utterance vector.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话人识别的计算机程序。 在一些实现中,获得从话语导出的话语向量。 根据多个不同的哈希函数为发声向量确定哈希值。 使用散列值来确定来自多个散列表的一组扬声器向量,其中每个扬声器向量是从相应说话者的一个或多个话语导出的。 将集合中的扬声器矢量与发声矢量进行比较。 基于将集合中的扬声器矢量与发声矢量进行比较来选择扬声器矢量。
-
公开(公告)号:US09472187B2
公开(公告)日:2016-10-18
申请号:US15164263
申请日:2016-05-25
Applicant: Google Inc.
Inventor: Olga Kapralova , John Paul Alex , Eugene Weinstein , Pedro J. Moreno Mengibar , Olivier Siohan , Ignacio Lopez Moreno
CPC classification number: G10L15/063 , G10L15/01 , G10L15/16 , G10L15/187 , G10L15/30 , G10L15/32 , G10L2015/0633
Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer. An updated production speech recognizer component is provided to the production speech recognizer for use in transcribing subsequently received speech data items.
-
-
-
-
-
-
-
-
-