System and method for personalization of acoustic models for automatic speech recognition
    21.
    发明授权
    System and method for personalization of acoustic models for automatic speech recognition 有权
    用于自动语音识别的声学模型个性化的系统和方法

    公开(公告)号:US09026444B2

    公开(公告)日:2015-05-05

    申请号:US12561005

    申请日:2009-09-16

    IPC分类号: G10L15/22 G10L15/07 G10L15/06

    摘要: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

    摘要翻译: 这里公开了用于自动语音识别的方法,系统和计算机可读存储介质。 该方法包括选择一个说话者独立模型,并选择一个说话者依赖模型的数量,说话人依赖模型的数量是基于可用的计算资源,所选择的模型包括与说话者无关的模型和说话者依赖模型的数量。 该方法还包括使用所选择的模型中的每一个并行地识别话语,并且基于使用所选择的模型的组的识别精度从所选择的模型中选择主要语言模型。 该系统包括处理器和被配置为控制处理器执行该方法的模块。 计算机可读存储介质包括用于使计算设备执行该方法的步骤的指令。

    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model
    22.
    发明授权
    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model 有权
    通过修改声学,语言和语义模型,由于错误的ASR输出来处理重复查询的系统和方法

    公开(公告)号:US08990085B2

    公开(公告)日:2015-03-24

    申请号:US12570757

    申请日:2009-09-30

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.

    摘要翻译: 本文公开了用于处理预期重复语音查询或其他输入的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备检测来自用户的误识别语音查询,确定用户基于先前用户交互重复语音查询的趋势,以及基于在预期重复语音查询之前确定的趋势来调整语音识别模型。 该方法还可以包括基于适应的语音识别模型识别来自用户的预期重复语音查询。 适应语音识别模型可以包括修改声学模型,语言模型和语义模型。 适应语音识别模型还可以包括基于使用历史和识别格中的条目为预期重复查询准备个性化搜索语音识别模型。 该方法可以包括使用适应的语音识别模型保留未修改的语音识别模型。

    Speech recognition based on pronunciation modeling
    23.
    发明授权
    Speech recognition based on pronunciation modeling 有权
    基于发音建模的语音识别

    公开(公告)号:US08214213B1

    公开(公告)日:2012-07-03

    申请号:US11380502

    申请日:2006-04-27

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    CPC分类号: G10L15/187 G10L15/063

    摘要: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the langue model.

    摘要翻译: 公开了一种用于执行语音识别的系统和方法。 该方法包括接收一个话语,使用具有发音概率的语言模型将话语应用于​​识别器,该语音模型具有与给定发音的单词相关联的单词识别符,并且提供用于发音的识别结果。 通过将发音模型从字典移动到语言模型来发现识别改进。

    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS
    24.
    发明申请
    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS 有权
    用于SPOKEN对话系统的多状态边界模型

    公开(公告)号:US20120101820A1

    公开(公告)日:2012-04-26

    申请号:US13279443

    申请日:2011-10-24

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    IPC分类号: G10L15/14 G10L15/06

    摘要: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    摘要翻译: 公开了一种在口语对话系统中应用多状态插入声学模型的方法。 该方法包括在呈现提示期间从用户接收音频语音输入,累积从用户输入的音频语音,将具有至少两个一状态隐马尔可夫模型(HMM)的非语音分量应用于音频语音 从用户输入,将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三状态HMM中的每一个表示不同的语音类别,确定音频语音输入是否为 来自用户的语音输入,并且如果音频语音输入被确定为来自用户的语音输入输入,则终止提示的呈现。

    Discriminative training of multi-state barge-in models for speech processing
    25.
    发明授权
    Discriminative training of multi-state barge-in models for speech processing 有权
    多国语言处理模式的歧视性训练

    公开(公告)号:US08000971B2

    公开(公告)日:2011-08-16

    申请号:US11930656

    申请日:2007-10-31

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    CPC分类号: G10L15/144 G10L15/063

    摘要: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.

    摘要翻译: 公开了用于在语音对话系统中训练用于语音处理的模型中的模型的系统和方法,包括以下步骤:(1)接收具有至少一个语音段和至少一个非语音段的输入,(2)建立 在输入的非语音段期间仅在语音段中识别语音段的限制,(2)通过允许语音隐马尔可夫模型(HMM)和非语音的任何序列来生成假设格点 HMM,(4)通过仅对至少一个语音段的语音HMM和至少一个非语音段的非语音HMM来产生参考点,其中不同的训练迭代产生至少一个不同的参考点,并且至少一个 参考转录,以及(5)使用所生成的参考网格作为用于语音处理的模型。

    SYSTEM AND METHOD FOR HANDLING REPEAT QUERIES DUE TO WRONG ASR OUTPUT
    26.
    发明申请
    SYSTEM AND METHOD FOR HANDLING REPEAT QUERIES DUE TO WRONG ASR OUTPUT 有权
    用于处理错误的ASR输出的REPEAT QUERIES的系统和方法

    公开(公告)号:US20110077942A1

    公开(公告)日:2011-03-31

    申请号:US12570757

    申请日:2009-09-30

    IPC分类号: G10L15/06

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and/or a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.

    摘要翻译: 本文公开了用于处理预期重复语音查询或其他输入的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备检测来自用户的误识别语音查询,确定用户基于先前用户交互重复语音查询的趋势,以及基于在预期重复语音查询之前确定的趋势来调整语音识别模型。 该方法还可以包括基于适应的语音识别模型识别来自用户的预期重复语音查询。 适应语音识别模型可以包括修改声学模型,语言模型和/或语义模型。 适应语音识别模型还可以包括基于使用历史和识别格中的条目为预期重复查询准备个性化搜索语音识别模型。 该方法可以包括使用适应的语音识别模型保留未修改的语音识别模型。

    CORRELATED CALL ANALYSIS
    27.
    发明申请
    CORRELATED CALL ANALYSIS 失效
    相关调用分析

    公开(公告)号:US20100161315A1

    公开(公告)日:2010-06-24

    申请号:US12343981

    申请日:2008-12-24

    IPC分类号: G06F17/27 G10L15/26

    摘要: A method of correlating received communication data with operational communication characteristics is provided. The method includes receiving audible input from a source in a communication over a communications network, recording the received audible input, and transcribing the recorded audible input into a transcript. The method further includes outputting the transcript, specifying features of the transcript to be analyzed, specifying and recording operational communication characteristics particular to the communication, analyzing the transcript for the specified features to identify patterns associated with the audible input, computing statistical correlations of the identified patterns with the operational communication characteristics, and outputting results of the computed statistical correlations on a user interface.

    摘要翻译: 提供了一种使接收到的通信数据与操作通信特性相关的方法。 该方法包括通过通信网络在通信中接收来自源的可听输入,记录所接收的可听输入,以及将记录的可听输入转录成抄本。 该方法还包括输出抄本,指定要分析的抄本的特征,指定和记录特定于通信的操作通信特征,分析指定特征的抄本以识别与可听见输入相关联的模式,计算所识别的 具有操作通信特性的模式,并且在用户界面上输出所计算的统计相关性的结果。

    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH
    28.
    发明申请
    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH 有权
    用于语音搜索的分辨率发音建模的系统和方法

    公开(公告)号:US20100125457A1

    公开(公告)日:2010-05-20

    申请号:US12274025

    申请日:2008-11-19

    IPC分类号: G10L15/04

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations.

    摘要翻译: 本文公开了用于语音识别的系统,计算机实现的方法和计算机可读介质。 该方法包括接收语音话语,在语音话语中为每个语音单元分配发音权重,将每个相应的发音权重以语音级别为单位归一化为1,对于每个接收到的语音话语,通过( 1)识别词和电话对齐和相应的可能性分数,以及(2)歧视地调整发音权重以最小化分类错误,以及使用优化的发音权重来识别附加的接收到的语音话语。 语音单位可以是句子,单词,上下文相关的电话,与上下文无关的电话或音节。 该方法还可以包括基于目标函数的歧视地适应发音权重。 目标函数可以是本领域技术人员已知的最大相互信息(MMI),最大似然(MLE)训练,最小分类误差(MCE)训练或其他功能。 言语言可以是名字。 可以作为多模态搜索或输入的一部分接收演讲话语。 歧视性地适应发音权重的步骤还可以包括随机建模发音。

    LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION
    29.
    发明申请
    LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION 有权
    低实时视角追踪长度正常化

    公开(公告)号:US20090259465A1

    公开(公告)日:2009-10-15

    申请号:US12490634

    申请日:2009-06-24

    IPC分类号: G10L15/02

    摘要: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

    摘要翻译: 提供了一种用于训练自动语音识别系统的方法和系统。 该方法包括将训练数据分离成说话者特定的分段,并且对于每个说话者的特定分段,执行以下动作:产生频谱数据,选择第一变形因子和扭曲频谱数据,以及将变形的频谱数据与语音模型进行比较。 该方法还包括迭代地执行选择另一个翘曲因子并产生另一个弯曲光谱数据,将其他翘曲光谱数据与语音模型进行比较的步骤,并且如果另一个翘曲因子产生与语音模型更接近的匹配,则节省另一个翘曲 因素是演讲者特定细分的最佳翘曲因素。 该系统包括被配置为控制系统中的处理器以执行该方法的步骤的模块。

    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS
    30.
    发明申请
    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS 有权
    用于自动语音识别的声学模型的系统和方法,用于识别前后职业

    公开(公告)号:US20090112594A1

    公开(公告)日:2009-04-30

    申请号:US11930675

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    摘要翻译: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。