System and method for pronunciation modeling
    21.
    发明授权
    System and method for pronunciation modeling 有权
    发音建模的系统和方法

    公开(公告)号:US08073693B2

    公开(公告)日:2011-12-06

    申请号:US12328407

    申请日:2008-12-04

    IPC分类号: G10L15/02

    摘要: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    摘要翻译: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    Multi-state barge-in models for spoken dialog systems
    22.
    发明授权
    Multi-state barge-in models for spoken dialog systems 有权
    用于口语对话系统的多状态插入模型

    公开(公告)号:US08046221B2

    公开(公告)日:2011-10-25

    申请号:US11930619

    申请日:2007-10-31

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    IPC分类号: G10L15/00

    摘要: Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    摘要翻译: 公开了用于在口语对话系统中应用多状态插入声学模型的系统,方法和计算机可读介质,包括以下步骤:(1)从口头对话系统向用户呈现提示。 (2)在呈现提示期间接收来自用户的音频语音输入,(3)累积从用户输入的音频语音,(4)应用具有至少两个一状态隐马尔可夫模型的非语音分量 HMM)到从用户输入的音频语音,(5)将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三态HMM中的每一个表示不同的语音类别 ,(6)确定音频语音输入是否是来自用户的输入语音输入,以及(7)如果音频语音输入被确定为来自用户的语音输入输入,则终止呈现 提示。

    DISCRIMINATIVE TRAINING OF MULTI-STATE BARGE-IN MODELS FOR SPEECH PROCESSING
    23.
    发明申请
    DISCRIMINATIVE TRAINING OF MULTI-STATE BARGE-IN MODELS FOR SPEECH PROCESSING 有权
    用于语音处理的多状态边界模型的辨别性训练

    公开(公告)号:US20090112595A1

    公开(公告)日:2009-04-30

    申请号:US11930656

    申请日:2007-10-31

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    IPC分类号: G10L15/14

    CPC分类号: G10L15/144 G10L15/063

    摘要: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.

    摘要翻译: 公开了用于在语音对话系统中训练用于语音处理的模型中的模型的系统和方法,包括以下步骤:(1)接收具有至少一个语音段和至少一个非语音段的输入,(2)建立 在输入的非语音段期间仅在语音段中识别语音段的限制,(2)通过允许语音隐马尔可夫模型(HMM)和非语音的任何序列来生成假设格点 HMM,(4)通过仅对至少一个语音段的语音HMM和至少一个非语音段的非语音HMM来产生参考点,其中不同的训练迭代产生至少一个不同的参考点,并且至少一个 参考转录,以及(5)使用所生成的参考网格作为用于语音处理的模型。

    Systems and methods of providing modified media content
    24.
    发明授权
    Systems and methods of providing modified media content 有权
    提供修改的媒体内容的系统和方法

    公开(公告)号:US09414010B2

    公开(公告)日:2016-08-09

    申请号:US13471851

    申请日:2012-05-15

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    摘要: A method includes receiving a command to provide media content configured to be sent to a display device for display at a particular scan rate. The media content includes audio data and video data. The method includes identifying high priority segments of the media content based on the audio data. The high priority segments are to be displayed by the display device at a presentation rate such that the high priority segments displayed at the presentation rate correspond to the media content displayed at the particular scan rate. The method also includes sending the high priority segments to the display device to provide video content and audio content of the requested media content for display.

    摘要翻译: 一种方法包括接收命令以提供配置成发送到显示设备以便以特定扫描速率显示的媒体内容。 媒体内容包括音频数据和视频数据。 该方法包括基于音频数据识别媒体内容的高优先级段。 显示设备将以显示速率显示高优先级片段,使得以呈现速率显示的高优先级片段对应于以特定扫描速率显示的媒体内容。 该方法还包括将高优先级段发送到显示设备以提供所请求的媒体内容的视频内容和音频内容以供显示。

    System and method for personalization of acoustic models for automatic speech recognition
    25.
    发明授权
    System and method for personalization of acoustic models for automatic speech recognition 有权
    用于自动语音识别的声学模型个性化的系统和方法

    公开(公告)号:US09026444B2

    公开(公告)日:2015-05-05

    申请号:US12561005

    申请日:2009-09-16

    IPC分类号: G10L15/22 G10L15/07 G10L15/06

    摘要: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

    摘要翻译: 这里公开了用于自动语音识别的方法,系统和计算机可读存储介质。 该方法包括选择一个说话者独立模型,并选择一个说话者依赖模型的数量,说话人依赖模型的数量是基于可用的计算资源,所选择的模型包括与说话者无关的模型和说话者依赖模型的数量。 该方法还包括使用所选择的模型中的每一个并行地识别话语,并且基于使用所选择的模型的组的识别精度从所选择的模型中选择主要语言模型。 该系统包括处理器和被配置为控制处理器执行该方法的模块。 计算机可读存储介质包括用于使计算设备执行该方法的步骤的指令。

    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model
    26.
    发明授权
    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model 有权
    通过修改声学,语言和语义模型,由于错误的ASR输出来处理重复查询的系统和方法

    公开(公告)号:US08990085B2

    公开(公告)日:2015-03-24

    申请号:US12570757

    申请日:2009-09-30

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.

    摘要翻译: 本文公开了用于处理预期重复语音查询或其他输入的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备检测来自用户的误识别语音查询,确定用户基于先前用户交互重复语音查询的趋势,以及基于在预期重复语音查询之前确定的趋势来调整语音识别模型。 该方法还可以包括基于适应的语音识别模型识别来自用户的预期重复语音查询。 适应语音识别模型可以包括修改声学模型,语言模型和语义模型。 适应语音识别模型还可以包括基于使用历史和识别格中的条目为预期重复查询准备个性化搜索语音识别模型。 该方法可以包括使用适应的语音识别模型保留未修改的语音识别模型。

    Speech recognition based on pronunciation modeling
    27.
    发明授权
    Speech recognition based on pronunciation modeling 有权
    基于发音建模的语音识别

    公开(公告)号:US08214213B1

    公开(公告)日:2012-07-03

    申请号:US11380502

    申请日:2006-04-27

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    CPC分类号: G10L15/187 G10L15/063

    摘要: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the langue model.

    摘要翻译: 公开了一种用于执行语音识别的系统和方法。 该方法包括接收一个话语,使用具有发音概率的语言模型将话语应用于​​识别器,该语音模型具有与给定发音的单词相关联的单词识别符,并且提供用于发音的识别结果。 通过将发音模型从字典移动到语言模型来发现识别改进。

    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS
    28.
    发明申请
    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS 有权
    用于SPOKEN对话系统的多状态边界模型

    公开(公告)号:US20120101820A1

    公开(公告)日:2012-04-26

    申请号:US13279443

    申请日:2011-10-24

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    IPC分类号: G10L15/14 G10L15/06

    摘要: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    摘要翻译: 公开了一种在口语对话系统中应用多状态插入声学模型的方法。 该方法包括在呈现提示期间从用户接收音频语音输入,累积从用户输入的音频语音,将具有至少两个一状态隐马尔可夫模型(HMM)的非语音分量应用于音频语音 从用户输入,将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三状态HMM中的每一个表示不同的语音类别,确定音频语音输入是否为 来自用户的语音输入,并且如果音频语音输入被确定为来自用户的语音输入输入,则终止提示的呈现。

    Discriminative training of multi-state barge-in models for speech processing
    29.
    发明授权
    Discriminative training of multi-state barge-in models for speech processing 有权
    多国语言处理模式的歧视性训练

    公开(公告)号:US08000971B2

    公开(公告)日:2011-08-16

    申请号:US11930656

    申请日:2007-10-31

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    CPC分类号: G10L15/144 G10L15/063

    摘要: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.

    摘要翻译: 公开了用于在语音对话系统中训练用于语音处理的模型中的模型的系统和方法,包括以下步骤:(1)接收具有至少一个语音段和至少一个非语音段的输入,(2)建立 在输入的非语音段期间仅在语音段中识别语音段的限制,(2)通过允许语音隐马尔可夫模型(HMM)和非语音的任何序列来生成假设格点 HMM,(4)通过仅对至少一个语音段的语音HMM和至少一个非语音段的非语音HMM来产生参考点,其中不同的训练迭代产生至少一个不同的参考点,并且至少一个 参考转录,以及(5)使用所生成的参考网格作为用于语音处理的模型。

    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH
    30.
    发明申请
    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH 有权
    用于语音搜索的分辨率发音建模的系统和方法

    公开(公告)号:US20100125457A1

    公开(公告)日:2010-05-20

    申请号:US12274025

    申请日:2008-11-19

    IPC分类号: G10L15/04

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations.

    摘要翻译: 本文公开了用于语音识别的系统,计算机实现的方法和计算机可读介质。 该方法包括接收语音话语,在语音话语中为每个语音单元分配发音权重,将每个相应的发音权重以语音级别为单位归一化为1,对于每个接收到的语音话语,通过( 1)识别词和电话对齐和相应的可能性分数,以及(2)歧视地调整发音权重以最小化分类错误,以及使用优化的发音权重来识别附加的接收到的语音话语。 语音单位可以是句子,单词,上下文相关的电话,与上下文无关的电话或音节。 该方法还可以包括基于目标函数的歧视地适应发音权重。 目标函数可以是本领域技术人员已知的最大相互信息(MMI),最大似然(MLE)训练,最小分类误差(MCE)训练或其他功能。 言语言可以是名字。 可以作为多模态搜索或输入的一部分接收演讲话语。 歧视性地适应发音权重的步骤还可以包括随机建模发音。