System and method for performing speech synthesis with a cache of phoneme sequences
    1.
    发明授权
    System and method for performing speech synthesis with a cache of phoneme sequences 有权
    用音素序列缓存执行语音合成的系统和方法

    公开(公告)号:US07983919B2

    公开(公告)日:2011-07-19

    申请号:US11836423

    申请日:2007-08-09

    申请人: Alistair Conkie

    发明人: Alistair Conkie

    IPC分类号: G10L13/04

    CPC分类号: G10L13/08 G10L13/04

    摘要: Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.

    摘要翻译: 公开了用于执行语音合成的系统,方法和计算机可读介质。 方法实施例包括将语音合成器的第一部分应用于文本语料库以获得多个音素序列,语音合成器的第一部分仅为所获得的多个音素序列中的每一个识别可能的音素序列,识别连接 将被计算以合成多个相应音素序列中的每一个,并将所识别的连接添加到用于语音合成的高速缓存中。

    AUTOMATIC SEGMENTATION IN SPEECH SYNTHESIS
    2.
    发明申请
    AUTOMATIC SEGMENTATION IN SPEECH SYNTHESIS 有权
    语音合成中的自动分类

    公开(公告)号:US20070271100A1

    公开(公告)日:2007-11-22

    申请号:US11832262

    申请日:2007-08-01

    IPC分类号: G10L15/14

    CPC分类号: G10L13/06

    摘要: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.

    摘要翻译: 自动分割语音库存的系统和方法。 使用引导数据初始化一组隐马尔可夫模型(HMM)。 接下来重新估计并对齐HMM以产生电话标签。 然后使用频谱边界校正来校正电话标签的电话边界。 可选地,迭代地执行将频谱边界校正的电话标签用作输入而不是引导数据的这个过程,以便进一步减少手动标签与由HMM方法分配的电话标签之间的不匹配。

    Method and apparatus for combining text to speech and recorded prompts
    3.
    发明授权
    Method and apparatus for combining text to speech and recorded prompts 有权
    用于组合文本到语音和记录的提示的方法和装置

    公开(公告)号:US08600753B1

    公开(公告)日:2013-12-03

    申请号:US11321638

    申请日:2005-12-30

    申请人: Alistair Conkie

    发明人: Alistair Conkie

    IPC分类号: G10L13/00 G10L13/08

    CPC分类号: G10L13/00 G10L13/08 G10L13/10

    摘要: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.

    摘要翻译: 一种安排提供了从消息文本产生的语音的改进综合。 这些安排为这些提示存储预录的提示和言语相关特征。 解析消息以确定是否先前已经记录了消息部分。 如果是,则检索那些部分的语音相关特征。 该安排为以前不存储的那些方生成语音相关特征。 检索和生成的特征相结合。 然后将特性的组合用作语音合成器的输入。

    Method and system for enhancing a speech database

    公开(公告)号:US08510113B1

    公开(公告)日:2013-08-13

    申请号:US11469134

    申请日:2006-08-31

    摘要: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

    Systems and methods of providing modified media content
    5.
    发明授权
    Systems and methods of providing modified media content 有权
    提供修改的媒体内容的系统和方法

    公开(公告)号:US08312492B2

    公开(公告)日:2012-11-13

    申请号:US11725591

    申请日:2007-03-19

    IPC分类号: H04N7/173 G06F15/00 G10L11/00

    摘要: A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.

    摘要翻译: 公开了提供媒体内容的方法和系统。 在特定实施例中,该方法包括在机顶盒设备处从内容源接收媒体内容。 媒体内容包括具有第一播放速率的视频数据和具有第一播放速率的音频数据。 该方法还包括经由非线性变换来变换音频数据以产生具有第二播放速率的修改的音频数据,修改视频数据以产生具有第二播放速率的修改的视频数据,以及使修改的音频数据和修改的视频同步 数据以产生具有第二播放速率的修改的媒体内容。 还公开了一种基于网络的媒体内容存储设备和相关逻辑以提供经调整的速率音频内容。

    System and method of word lattice augmentation using a pre/post vocalic consonant distinction
    6.
    发明授权
    System and method of word lattice augmentation using a pre/post vocalic consonant distinction 有权
    使用前/后声乐辅音区分的词格增强的系统和方法

    公开(公告)号:US08024191B2

    公开(公告)日:2011-09-20

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

    摘要翻译: 提供了系统和方法来识别语音对话系统中的语音。 该方法包括接收具有声前辅音或声后辅音的输入语音,通过将输入的语音与训练模型进行比较来产生至少一个输出格数,该输出格式通过比较输入语音来提供结果并区分前语音 辅音和语音后辅音。 通过测量输入语音中的声前辅音或声音后辅音与第一分数之间的相似度来计算第二分。 通过使用第二分数来确定至少一个类别,用于通过使用第二分数进行语前匹配或不匹配或者后声匹配或不匹配,并且通过使用至少一个类别对自动语音识别(ASR)系统的结果进行改进, 前声匹配或不匹配或后声匹配或不匹配。

    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION
    7.
    发明申请
    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION 有权
    使用前任/后期职业协商决定的字幕扩展的系统和方法

    公开(公告)号:US20090112591A1

    公开(公告)日:2009-04-30

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems and methods for recognizing speech in a spoken dialogue system. The method includes (1) receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant, (2) generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result; (3) distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech, (4) calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score, (5) determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score, and (6) refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

    摘要翻译: 公开了用于在口头对话系统中识别语音的系统和方法。 该方法包括(1)接收具有至少一个声前辅音或至少一个声后辅音的输入语音,(2)通过将输入的语音与训练模型进行比较来产生计算第一分数的至少一个输出格 提供结果; (3)在所述输入语音中区分所述至少一个声前辅音和所述至少一个声后辅音,(4)通过测量所述至少一个声前辅音或所述至少一个声前辅音之间的相似度来计算第二分数 输入语音和第一分数中的至少一个声音辅音,(5)通过使用第二分数来确定至少一个人声前匹配或不匹配或至少一个后声匹配或不匹配的至少一个类别,以及( 6)通过使用至少一个类别进行至少一个声前匹配或不匹配或至少一个后声匹配或不匹配,来改进自动语音识别(ASR)系统的结果。

    Method and system for enhancing a speech database
    8.
    发明授权
    Method and system for enhancing a speech database 有权
    用于增强语音数据库的方法和系统

    公开(公告)号:US08510112B1

    公开(公告)日:2013-08-13

    申请号:US11469129

    申请日:2006-08-31

    CPC分类号: G10L13/06 G10L2021/0135

    摘要: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

    摘要翻译: 公开了增强用于语音合成的语音数据库的系统,方法和计算机可读介质。 该方法可以包括在主语音数据库中标记音频文件,基于语言差异识别具有不同发音的标记音频文件中的片段,使用所选择的映射修改主语音数据库中的所识别的片段,通过将主要语音数据库替换为 用于主语音数据库中相应识别的数据库段的修改段,以及存储用于语音合成的增强型主语音数据库。

    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS
    9.
    发明申请
    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS 有权
    用于自动语音识别的声学模型的系统和方法,用于识别前后职业

    公开(公告)号:US20090112594A1

    公开(公告)日:2009-04-30

    申请号:US11930675

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    摘要翻译: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。

    PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS
    10.
    发明申请
    PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS 审中-公开
    在单元选择语音合成中的电话强化标签

    公开(公告)号:US20080077407A1

    公开(公告)日:2008-03-27

    申请号:US11535146

    申请日:2006-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/06 G10L13/08

    摘要: A system, method and computer-readable media are disclosed for improving speech synthesis. A text-to-speech (TTS) voice database for use in a TTS system is generated by a method comprising labeling a voice database phonemically and applying a pre-/post-vocalic distinction to the phonemic labels to generate a TTS voice database. When a system synthesizes speech using speech units from the TTS voice database, the database provides phonemes for selection using the pre-/post-vocalic distinctions which improve unit selection to render the synthetic speech more natural.

    摘要翻译: 公开了用于改进语音合成的系统,方法和计算机可读介质。 用于TTS系统的文本到语音(TTS)语音数据库通过一种方法产生,该方法包括以语音的方式标注语音数据库,并且将语音前/后的区别应用于音素标签以产生TTS语音数据库。 当系统使用来自TTS语音数据库的语音单元来合成语音时,数据库使用前/后声部区分提供用于选择的音素,这改进了单元选择以使合成语音更自然。