SYSTEM AND METHOD FOR PRONUNCIATION MODELING
    51.
    发明申请
    SYSTEM AND METHOD FOR PRONUNCIATION MODELING 有权
    发明建模系统与方法

    公开(公告)号:US20120065975A1

    公开(公告)日:2012-03-15

    申请号:US13302380

    申请日:2011-11-22

    CPC classification number: G10L15/187 G10L15/183 G10L2015/025

    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    Abstract translation: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling
    52.
    发明授权
    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling 有权
    通过改进发音建模来增加词汇单词识别率的系统和方法

    公开(公告)号:US08095365B2

    公开(公告)日:2012-01-10

    申请号:US12328436

    申请日:2008-12-04

    CPC classification number: G06F17/277 G10L15/063 G10L15/187

    Abstract: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    Abstract translation: 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 所述方法包括:将符号输入作为标记的语音数据接收,基于所述符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将所识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants
    53.
    发明授权
    System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants 有权
    用于自动语音识别的声学模型的系统和方法,其区分声前和后声辅音

    公开(公告)号:US08015008B2

    公开(公告)日:2011-09-06

    申请号:US11930675

    申请日:2007-10-31

    CPC classification number: G10L25/78 G10L15/02

    Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    Abstract translation: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。

    SYSTEM AND METHOD FOR IMPROVED AUTOMATIC SPEECH RECOGNITION PERFORMANCE
    54.
    发明申请
    SYSTEM AND METHOD FOR IMPROVED AUTOMATIC SPEECH RECOGNITION PERFORMANCE 有权
    用于改进自动语音识别性能的系统和方法

    公开(公告)号:US20110137648A1

    公开(公告)日:2011-06-09

    申请号:US12631131

    申请日:2009-12-04

    CPC classification number: G10L15/00 G10L15/285 G10L15/32

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.

    Abstract translation: 本文公开了用于改善自动语音识别性能的系统,方法和计算机可读存储介质。 实施该方法的系统识别空闲语音识别资源,并且基于总体语音识别需求在空闲资源上建立补充语音识别器。 补充语音识别器可以与主语音识别器不同,并且与主语音识别器一起可以与特定扬声器相关联。 该系统与主语音识别器和辅助语音识别器并行地执行从特定扬声器接收的语音的语音识别,并且组合来自主语音识别器和补充语音识别器的结果。 系统基于组合的结果识别接收到的语音。 该系统可以使用波束调整来代替或与补充语音识别器组合。 调度算法可以定制语音识别资源的特定组合,并且基于增加的需求来释放补充语音识别器。

    SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING
    56.
    发明申请
    SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING 有权
    通过声学模型重建来适应自动语音识别发音的系统和方法

    公开(公告)号:US20100312560A1

    公开(公告)日:2010-12-09

    申请号:US12480848

    申请日:2009-06-09

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

    Abstract translation: 这里公开的是系统,计算机实现的方法和用于通过声学模型重构来适应自动语音识别发音来识别语音的计算机可读存储介质。 该方法识别在目标方言中典型的本地语音训练的声学模型和匹配的发音字典。 该方法从新的演讲者收集演讲,从而收集到的演讲并转录收集的演讲,以产生一个合理的音素格子。 然后,该方法创建一个自定义语音模型,用于通过用于所有似乎合理的音素的声学模型的加权和来表示在发音字典中使用的每个音素,其中发音字典不改变,而是在每个音素的声学空间的模型中 字典成为典型本地语音的音素的声学模型的加权和。 最后,该方法包括使用定制语音模型通过处理器从目标说话者识别附加语音。

    Low latency real-time vocal tract length normalization
    57.
    发明授权
    Low latency real-time vocal tract length normalization 有权
    低延迟实时声道长度归一化

    公开(公告)号:US07567903B1

    公开(公告)日:2009-07-28

    申请号:US11034535

    申请日:2005-01-12

    CPC classification number: G10L15/063 G10L15/10 G10L15/12 G10L17/04 G10L17/08

    Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

    Abstract translation: 提供了一种用于执行语音识别的方法和装置。 声音段长度从训练数据生成扬声器的归一化声学模型。 在第一识别输入上执行语音识别以确定第一最佳假设。 第一个声带长度归一化因子是基于第一个最佳假设估计的。 在第二识别输入上使用声带长度归一化声学模型进行语音识别,以确定另一个最佳假设。 另一个声带长度归一化因子基于另一个最佳假设和至少一个先前的最佳假设来估计。

    Systems and Methods of providing modified media content
    58.
    发明申请
    Systems and Methods of providing modified media content 有权
    提供修改媒体内容的系统和方法

    公开(公告)号:US20080235741A1

    公开(公告)日:2008-09-25

    申请号:US11725591

    申请日:2007-03-19

    Abstract: A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.

    Abstract translation: 公开了提供媒体内容的方法和系统。 在特定实施例中,该方法包括在机顶盒设备处从内容源接收媒体内容。 媒体内容包括具有第一播放速率的视频数据和具有第一播放速率的音频数据。 该方法还包括经由非线性变换来变换音频数据以产生具有第二播放速率的修改的音频数据,修改视频数据以产生具有第二播放速率的修改的视频数据,以及使修改的音频数据和修改的视频同步 数据以产生具有第二播放速率的修改的媒体内容。 还公开了一种基于网络的媒体内容存储设备和相关逻辑以提供经调整的速率音频内容。

    Systems and methods of providing modified media content
    59.
    发明申请
    Systems and methods of providing modified media content 有权
    提供修改的媒体内容的系统和方法

    公开(公告)号:US20080226256A1

    公开(公告)日:2008-09-18

    申请号:US11716995

    申请日:2007-03-12

    Abstract: A method of providing modified media content is disclosed that includes providing media content to a destination device via a network, where the media content comprises video data and audio data have a first viewing rate. The method further includes receiving data indicating a selection of a second viewing rate via the network and modifying the media content to produce modified media content having approximately the second viewing rate. The modified media content includes modified video data and modified audio data synchronized at approximately the second viewing rate.

    Abstract translation: 公开了一种提供修改的媒体内容的方法,其包括经由网络向目的地设备提供媒体内容,其中媒体内容包括视频数据和音频数据具有第一观看速率。 该方法还包括接收经由网络指示选择第二观看速率的数据,并修改媒体内容以产生具有大约第二观看速率的修改的媒体内容。 修改的媒体内容包括修改的视频数据和大约第二观看速率同步的修改的音频数据。

Patent Agency Ranking