System and method for personalization of acoustic models for automatic speech recognition
    1.
    发明授权
    System and method for personalization of acoustic models for automatic speech recognition 有权
    用于自动语音识别的声学模型个性化的系统和方法

    公开(公告)号:US09026444B2

    公开(公告)日:2015-05-05

    申请号:US12561005

    申请日:2009-09-16

    IPC分类号: G10L15/22 G10L15/07 G10L15/06

    摘要: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

    摘要翻译: 这里公开了用于自动语音识别的方法,系统和计算机可读存储介质。 该方法包括选择一个说话者独立模型,并选择一个说话者依赖模型的数量,说话人依赖模型的数量是基于可用的计算资源,所选择的模型包括与说话者无关的模型和说话者依赖模型的数量。 该方法还包括使用所选择的模型中的每一个并行地识别话语,并且基于使用所选择的模型的组的识别精度从所选择的模型中选择主要语言模型。 该系统包括处理器和被配置为控制处理器执行该方法的模块。 计算机可读存储介质包括用于使计算设备执行该方法的步骤的指令。

    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model
    3.
    发明授权
    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model 有权
    通过修改声学,语言和语义模型,由于错误的ASR输出来处理重复查询的系统和方法

    公开(公告)号:US08990085B2

    公开(公告)日:2015-03-24

    申请号:US12570757

    申请日:2009-09-30

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.

    摘要翻译: 本文公开了用于处理预期重复语音查询或其他输入的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备检测来自用户的误识别语音查询,确定用户基于先前用户交互重复语音查询的趋势,以及基于在预期重复语音查询之前确定的趋势来调整语音识别模型。 该方法还可以包括基于适应的语音识别模型识别来自用户的预期重复语音查询。 适应语音识别模型可以包括修改声学模型,语言模型和语义模型。 适应语音识别模型还可以包括基于使用历史和识别格中的条目为预期重复查询准备个性化搜索语音识别模型。 该方法可以包括使用适应的语音识别模型保留未修改的语音识别模型。

    System and method for speech personalization by need
    5.
    发明授权
    System and method for speech personalization by need 有权
    需要语音个性化的系统和方法

    公开(公告)号:US09002713B2

    公开(公告)日:2015-04-07

    申请号:US12480864

    申请日:2009-06-09

    摘要: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions. The method can further store a speaker personalization profile having information for the modified set of allocated resources and recognize speech associated with the speaker based on the speaker personalization profile.

    摘要翻译: 这里公开了用于说话人识别个性化的系统,计算机实现的方法和有形的计算机可读存储介质。 该方法使用一组分配的资源来识别从与语音接口交互的扬声器接收的语音,所分配的资源的集合包括带宽,处理器时间,存储器和存储。 该方法记录与识别的语音相关联的度量,并且在记录度量之后,修改与记录的度量相称的所分配资源集合中的所分配的资源中的至少一个。 该方法使用经修改的分配资源集来识别来自扬声器的附加语音。 指标可以包括语音识别置信度分数,处理速度,对话行为,重复请求,对确认的否定响应以及任务完成。 该方法还可以存储具有用于所修改的分配资源集合的信息的扬声器个性化简档,并且基于说话者个性化简档识别与说话者相关联的语音。

    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling
    6.
    发明授权
    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling 有权
    通过改进发音建模来增加词汇单词识别率的系统和方法

    公开(公告)号:US08892441B2

    公开(公告)日:2014-11-18

    申请号:US13311512

    申请日:2011-12-05

    IPC分类号: G10L15/187 G10L15/06

    摘要: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes overgenerating potential pronunciations based on symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    摘要翻译: 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 该方法包括基于符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    System and method for pronunciation modeling
    7.
    发明授权
    System and method for pronunciation modeling 有权
    发音建模的系统和方法

    公开(公告)号:US08073693B2

    公开(公告)日:2011-12-06

    申请号:US12328407

    申请日:2008-12-04

    IPC分类号: G10L15/02

    摘要: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    摘要翻译: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH
    8.
    发明申请
    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH 有权
    用于语音搜索的分辨率发音建模的系统和方法

    公开(公告)号:US20100125457A1

    公开(公告)日:2010-05-20

    申请号:US12274025

    申请日:2008-11-19

    IPC分类号: G10L15/04

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations.

    摘要翻译: 本文公开了用于语音识别的系统,计算机实现的方法和计算机可读介质。 该方法包括接收语音话语,在语音话语中为每个语音单元分配发音权重,将每个相应的发音权重以语音级别为单位归一化为1,对于每个接收到的语音话语,通过( 1)识别词和电话对齐和相应的可能性分数,以及(2)歧视地调整发音权重以最小化分类错误,以及使用优化的发音权重来识别附加的接收到的语音话语。 语音单位可以是句子,单词,上下文相关的电话,与上下文无关的电话或音节。 该方法还可以包括基于目标函数的歧视地适应发音权重。 目标函数可以是本领域技术人员已知的最大相互信息(MMI),最大似然(MLE)训练,最小分类误差(MCE)训练或其他功能。 言语言可以是名字。 可以作为多模态搜索或输入的一部分接收演讲话语。 歧视性地适应发音权重的步骤还可以包括随机建模发音。

    System and method for pronunciation modeling
    9.
    发明授权
    System and method for pronunciation modeling 有权
    发音建模的系统和方法

    公开(公告)号:US08862470B2

    公开(公告)日:2014-10-14

    申请号:US13302380

    申请日:2011-11-22

    IPC分类号: G10L15/187 G10L15/183

    摘要: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    摘要翻译: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    System and method for handling missing speech data
    10.
    发明授权
    System and method for handling missing speech data 有权
    用于处理丢失的语音数据的系统和方法

    公开(公告)号:US08751229B2

    公开(公告)日:2014-06-10

    申请号:US12275920

    申请日:2008-11-21

    IPC分类号: G10L15/00

    摘要: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score.

    摘要翻译: 本文公开了用于处理丢失的语音数据的系统,计算机实现的方法和有形的计算机可读介质。 计算机实现的方法包括接收具有缺失段的语音,为缺失段生成多个假设,识别缺失段的最佳假设,以及通过为缺失段插入所识别的最佳假设来识别接收到的语音。 在另一种方法实施例中,通过为缺失的段插入所识别的最佳假设,来代替最后的步骤来合成所接收的语音。 在一个方面,所述方法还包括识别缺失段的持续时间并为缺失段生成所识别的持续时间的多个假设。 识别缺失片段的最佳假设的步骤可以基于语音上下文,发音词典和/或语言模型。 每个假设可以具有相同的声学得分。