-
公开(公告)号:US09460712B1
公开(公告)日:2016-10-04
申请号:US14454198
申请日:2014-08-07
Applicant: Google Inc.
Inventor: Brian Strope , William J. Byrne , Francoise Beaufays
CPC classification number: G10L15/22 , G06F17/30241 , G06F17/30864 , G06F17/30867 , G06F17/3087 , G06Q30/02 , G10L15/18 , G10L15/197 , G10L15/26 , G10L15/30 , G10L2015/223 , G10L2015/228
Abstract: A method of operating a voice-enabled business directory search system includes receiving category-business pairs, each category-business pair including a business category and a specific business, and establishing a data structure having nodes based on the category-business pairs. Each node of the data structure is associated with one or more business categories and a speech recognition language model for recognizing specific businesses associated with the one or more businesses categories.
Abstract translation: 操作启用语音的业务目录搜索系统的方法包括接收类别业务对,每个类别业务对包括业务类别和特定业务,以及基于类别业务对建立具有节点的数据结构。 数据结构的每个节点与一个或多个业务类别和用于识别与一个或多个企业类别相关联的特定业务的语音识别语言模型相关联。
-
公开(公告)号:US09275635B1
公开(公告)日:2016-03-01
申请号:US13672945
申请日:2012-11-09
Applicant: Google Inc.
Inventor: Francoise Beaufays , Brian Strope , Yun-hsuan Sung
IPC: G10L15/00
CPC classification number: G10L15/32 , G10L15/183
Abstract: Speech recognition systems may perform the following operations: receiving audio at a computing device; identifying a language associated with the audio; recognizing the audio using recognition models for different versions of the language to produce recognition candidates for the audio, where the recognition candidates are associated with corresponding information; comparing the information of the recognition candidates to identify agreement between at least two of the recognition models; selecting a recognition candidate based on information of the recognition candidate and agreement between the at least two of the recognition models; and outputting data corresponding to the selected recognition candidate as a recognized version of the audio.
Abstract translation: 语音识别系统可以执行以下操作:在计算设备处接收音频; 识别与音频相关联的语言; 使用用于不同版本的语言的识别模型来识别音频以产生用于音频的识别候选,其中识别候选者与对应的信息相关联; 比较识别候选者的信息以识别至少两个识别模型之间的一致性; 基于所述识别候选者的信息和所述至少两个识别模型之间的一致性来选择识别候选者; 并输出与所选择的识别候选对应的数据作为音频的识别版本。
-
公开(公告)号:US09110880B1
公开(公告)日:2015-08-18
申请号:US13832160
申请日:2013-03-15
Applicant: Google, Inc.
Inventor: Brian Strope , Francoise Beaufays
IPC: G06F17/27
CPC classification number: G10L15/183
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for pruning a language model are disclosed. The methods, systems, and apparatus include actions of selecting a candidate portion of the language model to evaluate for pruning, obtaining an entropy score representing information loss that would result from pruning the candidate portion of the language model, obtaining an acoustic score representing acoustic confusability of one or more words modeled by the candidate portion of the language model, and evaluating whether to prune the candidate portion of the language model using the entropy score and the acoustic score.
Abstract translation: 公开了包括在计算机存储介质上编码的用于修剪语言模型的计算机程序的方法,系统和装置。 方法,系统和装置包括选择语言模型的候选部分以评估修剪的动作,获得表示由修剪语言模型的候选部分导致的信息丢失的熵分数,获得表示声学混淆性的声学分数 由所述语言模型的候选部分建模的一个或多个单词,以及使用所述熵评分和所述声分数来评估是否修剪所述语言模型的候选部分。
-
公开(公告)号:US10747427B2
公开(公告)日:2020-08-18
申请号:US15422175
申请日:2017-02-01
Applicant: Google Inc.
Inventor: Ouais Alsharif , Peter Ciccotto , Francoise Beaufays , Dragan Zivkovic
IPC: G06F3/0488 , G06F3/023 , G06F40/263 , G06F40/274
Abstract: A keyboard is described that determines, using a first decoder and based on a selection of keys of a graphical keyboard, text. Responsive to determining that a characteristic of the text satisfies a threshold, a model of the keyboard identifies the target language of the text, and determines whether the target language is different than a language associated with the first decoder. If the target language of the text is not different than the language associated with the first decoder, the keyboard outputs, for display, an indication of first candidate words determined by the first decoder from the text. If the target language of the text is different: the keyboard enables a second decoder, where a language associated with the second decoder matches the target language of the text, and outputs, for display, an indication of second candidate words determined by the second decoder from the text.
-
公开(公告)号:US09837070B2
公开(公告)日:2017-12-05
申请号:US14186400
申请日:2014-02-21
Applicant: Google Inc.
Inventor: Fuchun Peng , Kanury Kanishka Rao , Francoise Beaufays
CPC classification number: G10L15/063 , G10L15/26
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying pronunciations. In one aspect, a method includes obtaining a first transcription for an utterance. A second transcription for the utterance is obtained. The second transcription is different from the first transcription. One or more feature scores are determined based on the first transcription and the second transcription. The one or more feature scores are input to a trained classifier. An output of the classifier is received. The output indicates which of the first transcription and the second transcription is more likely to be a correct transcription of the utterance.
-
公开(公告)号:US09678664B2
公开(公告)日:2017-06-13
申请号:US14683861
申请日:2015-04-10
Applicant: Google Inc.
Inventor: Shumin Zhai , Thomas Breuel , Ouais Alsharif , Yu Ouyang , Francoise Beaufays , Johan Schalkwyk
IPC: G06F3/02 , G06F3/0489 , G06F17/27 , G06F3/0488 , G06F3/023 , G06N3/04
CPC classification number: G06F3/04886 , G06F3/0219 , G06F3/0233 , G06F3/0237 , G06F3/0482 , G06F3/04883 , G06F3/04895 , G06F17/273 , G06F17/276 , G06F17/2765 , G06N3/0445 , G06N3/08
Abstract: In some examples, a computing device includes at least one processor; and at least one module, operable by the at least one processor to: output, for display at an output device, a graphical keyboard; receive an indication of a gesture detected at a location of a presence-sensitive input device, wherein the location of the presence-sensitive input device corresponds to a location of the output device that outputs the graphical keyboard; determine, based on at least one spatial feature of the gesture that is processed by the computing device using a neural network, at least one character string, wherein the at least one spatial feature indicates at least one physical property of the gesture; and output, for display at the output device, based at least in part on the processing of the at least one spatial feature of the gesture using the neural network, the at least one character string.
-
公开(公告)号:US20160351188A1
公开(公告)日:2016-12-01
申请号:US14811939
申请日:2015-07-29
Applicant: Google Inc.
Inventor: Kanury Kanishka Rao , Francoise Beaufays , Hasim Sak , Ouais Alsharif
IPC: G10L15/187 , G10L15/05 , G10L15/16 , G06F17/27
CPC classification number: G10L15/187 , G06N3/0445 , G06N3/084 , G10L15/063 , G10L15/16 , G10L2015/025
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the time steps processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output for the time step using a phoneme output layer to generate a phoneme representation for the acoustic feature representation for the time step; and processing the recurrent output for the time step using a grapheme output layer to generate a grapheme representation for the acoustic feature representation for the time step; and extracting, from the phoneme and grapheme representations for the acoustic feature representations at each time step, a respective pronunciation for each of one or more words.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于从声学序列学习发音的计算机程序。 一种方法包括接收声学序列,所述声学序列包括在多个时间步长中的每一个处的相应声学特征表示; 对于通过一个或多个循环神经网络层中的每一个处理声学特征表示的每个时间步骤,以产生反复输出; 使用音素输出层处理时间步长的复现输出,以产生用于时间步长的声学特征表示的音素表示; 以及使用字形输出层处理所述时间步长的复现输出,以生成用于所述时间步长的声学特征表示的图形表示; 并且从每个时间步长处的声音特征表示的音素和图形表示中提取一个或多个单词中的每一个的相应发音。
-
公开(公告)号:US20160275951A1
公开(公告)日:2016-09-22
申请号:US15171374
申请日:2016-06-02
Applicant: Google Inc.
Inventor: Brian Patrick Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
Abstract translation: 除了别的以外,本说明书的主题可以体现在包括通过多个语音识别系统(SRS)接收音频信号和发起语音识别任务的方法。 每个SRS被配置为产生指定包括在音频信号中的可能语音的识别结果,以及指示对语音结果的正确性置信度的置信度值。 该方法还包括完成语音识别任务的一部分,包括生成一个或多个识别结果和一个或多个识别结果的一个或多个置信度值,确定一个或多个置信度值是否满足置信阈值,中止其余部分 的没有产生识别结果的SRS的语音识别任务,并且基于所生成的一个或多个语音结果中的至少一个来输出最终识别结果。
-
公开(公告)号:US09129591B2
公开(公告)日:2015-09-08
申请号:US13726954
申请日:2012-12-26
Applicant: Google Inc.
Inventor: Yun-hsuan Sung , Francoise Beaufays , Brian Strope , Hui Lin , Jui-Ting Huang
IPC: G10L15/28 , G10L15/00 , G10L15/32 , G10L15/183
CPC classification number: G10L15/005 , G10L15/183 , G10L15/32
Abstract: Speech recognition systems may perform the following operations: receiving audio; recognizing the audio using language models for different languages to produce recognition candidates for the audio, where the recognition candidates are associated with corresponding recognition scores; identifying a candidate language for the audio; selecting a recognition candidate based on the recognition scores and the candidate language; and outputting data corresponding to the selected recognition candidate as a recognized version of the audio.
Abstract translation: 语音识别系统可以执行以下操作:接收音频; 使用不同语言的语言模型识别音频以产生用于音频的识别候选,其中识别候选与相应的识别分数相关联; 识别音频的候选语言; 基于识别分数和候选语言选择识别候选; 并输出与所选择的识别候选对应的数据作为音频的识别版本。
-
公开(公告)号:US20150170642A1
公开(公告)日:2015-06-18
申请号:US14109316
申请日:2013-12-17
Applicant: Google Inc.
Inventor: Fuchun Peng , Francoise Beaufays , Pedro J. Moreno Mengibar , Brian Patrick Strope
IPC: G10L15/187 , G10L15/26
CPC classification number: G10L15/187 , G10L15/005 , G10L2015/025 , G10L2015/227
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, including selecting terms; obtaining an expected phonetic transcription of an idealized native speaker of a natural language speaking the terms; receiving audio data corresponding to a particular user speaking the terms in the natural language; obtaining, based on the audio data, an actual phonetic transcription of the particular user speaking the terms in the natural language; aligning the expected phonetic transcription of the idealized native speaker of the natural language with the actual phonetic transcription of the particular user; identifying, based on the aligning, a portion of the expected phonetic transcription that is different than a corresponding portion of the actual phonetic transcription; and based on identifying the portion of the expected phonetic transcription, designating the expected phonetic transcription as a substitute pronunciation for the corresponding portion of the actual phonetic transcription.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,包括选择术语; 用语言来获得理想化的母语者自然语言的预期语音转录; 接收对应于以自然语言表达术语的特定用户的音频数据; 基于音频数据获得以自然语言表达术语的特定用户的实际语音转录; 将理想化的自然语言的母语者的预期语音转录与特定用户的实际语音转录对齐; 基于对齐来识别不同于实际语音转录的相应部分的预期语音转录的一部分; 并且基于识别预期语音转录的部分,将预期的语音转录指定为实际语音转录的相应部分的替代发音。
-
-
-
-
-
-
-
-
-