AUTOMATIC SEGMENTATION IN SPEECH SYNTHESIS
    1.
    发明申请
    AUTOMATIC SEGMENTATION IN SPEECH SYNTHESIS 有权
    语音合成中的自动分类

    公开(公告)号:US20070271100A1

    公开(公告)日:2007-11-22

    申请号:US11832262

    申请日:2007-08-01

    IPC分类号: G10L15/14

    CPC分类号: G10L13/06

    摘要: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.

    摘要翻译: 自动分割语音库存的系统和方法。 使用引导数据初始化一组隐马尔可夫模型(HMM)。 接下来重新估计并对齐HMM以产生电话标签。 然后使用频谱边界校正来校正电话标签的电话边界。 可选地,迭代地执行将频谱边界校正的电话标签用作输入而不是引导数据的这个过程,以便进一步减少手动标签与由HMM方法分配的电话标签之间的不匹配。

    System and method of word lattice augmentation using a pre/post vocalic consonant distinction
    2.
    发明授权
    System and method of word lattice augmentation using a pre/post vocalic consonant distinction 有权
    使用前/后声乐辅音区分的词格增强的系统和方法

    公开(公告)号:US08024191B2

    公开(公告)日:2011-09-20

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

    摘要翻译: 提供了系统和方法来识别语音对话系统中的语音。 该方法包括接收具有声前辅音或声后辅音的输入语音,通过将输入的语音与训练模型进行比较来产生至少一个输出格数,该输出格式通过比较输入语音来提供结果并区分前语音 辅音和语音后辅音。 通过测量输入语音中的声前辅音或声音后辅音与第一分数之间的相似度来计算第二分。 通过使用第二分数来确定至少一个类别,用于通过使用第二分数进行语前匹配或不匹配或者后声匹配或不匹配,并且通过使用至少一个类别对自动语音识别(ASR)系统的结果进行改进, 前声匹配或不匹配或后声匹配或不匹配。

    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION
    3.
    发明申请
    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION 有权
    使用前任/后期职业协商决定的字幕扩展的系统和方法

    公开(公告)号:US20090112591A1

    公开(公告)日:2009-04-30

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems and methods for recognizing speech in a spoken dialogue system. The method includes (1) receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant, (2) generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result; (3) distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech, (4) calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score, (5) determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score, and (6) refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

    摘要翻译: 公开了用于在口头对话系统中识别语音的系统和方法。 该方法包括(1)接收具有至少一个声前辅音或至少一个声后辅音的输入语音,(2)通过将输入的语音与训练模型进行比较来产生计算第一分数的至少一个输出格 提供结果; (3)在所述输入语音中区分所述至少一个声前辅音和所述至少一个声后辅音,(4)通过测量所述至少一个声前辅音或所述至少一个声前辅音之间的相似度来计算第二分数 输入语音和第一分数中的至少一个声音辅音,(5)通过使用第二分数来确定至少一个人声前匹配或不匹配或至少一个后声匹配或不匹配的至少一个类别,以及( 6)通过使用至少一个类别进行至少一个声前匹配或不匹配或至少一个后声匹配或不匹配,来改进自动语音识别(ASR)系统的结果。

    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS
    4.
    发明申请
    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS 有权
    用于自动语音识别的声学模型的系统和方法,用于识别前后职业

    公开(公告)号:US20090112594A1

    公开(公告)日:2009-04-30

    申请号:US11930675

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    摘要翻译: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。

    PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS
    5.
    发明申请
    PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS 审中-公开
    在单元选择语音合成中的电话强化标签

    公开(公告)号:US20080077407A1

    公开(公告)日:2008-03-27

    申请号:US11535146

    申请日:2006-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/06 G10L13/08

    摘要: A system, method and computer-readable media are disclosed for improving speech synthesis. A text-to-speech (TTS) voice database for use in a TTS system is generated by a method comprising labeling a voice database phonemically and applying a pre-/post-vocalic distinction to the phonemic labels to generate a TTS voice database. When a system synthesizes speech using speech units from the TTS voice database, the database provides phonemes for selection using the pre-/post-vocalic distinctions which improve unit selection to render the synthetic speech more natural.

    摘要翻译: 公开了用于改进语音合成的系统,方法和计算机可读介质。 用于TTS系统的文本到语音(TTS)语音数据库通过一种方法产生,该方法包括以语音的方式标注语音数据库,并且将语音前/后的区别应用于音素标签以产生TTS语音数据库。 当系统使用来自TTS语音数据库的语音单元来合成语音时,数据库使用前/后声部区分提供用于选择的音素,这改进了单元选择以使合成语音更自然。

    System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants
    6.
    发明授权
    System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants 有权
    用于自动语音识别的声学模型的系统和方法,其区分声前和后声辅音

    公开(公告)号:US08015008B2

    公开(公告)日:2011-09-06

    申请号:US11930675

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    摘要翻译: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。

    System and method for performing speech synthesis with a cache of phoneme sequences
    7.
    发明授权
    System and method for performing speech synthesis with a cache of phoneme sequences 有权
    用音素序列缓存执行语音合成的系统和方法

    公开(公告)号:US07983919B2

    公开(公告)日:2011-07-19

    申请号:US11836423

    申请日:2007-08-09

    申请人: Alistair Conkie

    发明人: Alistair Conkie

    IPC分类号: G10L13/04

    CPC分类号: G10L13/08 G10L13/04

    摘要: Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.

    摘要翻译: 公开了用于执行语音合成的系统,方法和计算机可读介质。 方法实施例包括将语音合成器的第一部分应用于文本语料库以获得多个音素序列,语音合成器的第一部分仅为所获得的多个音素序列中的每一个识别可能的音素序列,识别连接 将被计算以合成多个相应音素序列中的每一个,并将所识别的连接添加到用于语音合成的高速缓存中。

    Copying human interactions through learning and discovery
    8.
    发明授权
    Copying human interactions through learning and discovery 有权
    通过学习和发现来复制人际交往

    公开(公告)号:US08990126B1

    公开(公告)日:2015-03-24

    申请号:US11462068

    申请日:2006-08-03

    IPC分类号: G06F15/18 G06F17/21

    摘要: A method, system and computer readable medium that generates a dialog model for use in automated dialog is disclosed. The method may include collecting a plurality of task-oriented dialog interactions between users and human agents for a given domain, identifying one or more task in each dialog interaction, identifying one or more subtasks in each identified task and associating relations between the subtasks, identifying a dialog act and a set of predicate-argument relations for each subtask, generating one or more clauses from the set of predicate-argument relations, storing the tasks, subtasks, dialog acts predicate-argument relations, and clauses from each dialog interaction as a dialog interaction set, generating a dialog management model using the stored dialog interaction sets.

    摘要翻译: 公开了一种生成用于自动对话中的对话模型的方法,系统和计算机可读介质。 该方法可以包括针对给定域收集用户和人类代理之间的多个面向任务的对话交互,识别每个对话交互中的一个或多个任务,识别每个被识别的任务中的一个或多个子任务并且关联子任务之间的关系,识别 每个子任务的对话行为和一组谓词 - 参数关系,从一组谓词参数关系生成一个或多个子句,将任务,子任务,对话动作谓词参数关系和每个对话框交互的子句存储为 对话交互集,使用存储的对话交互集生成对话管理模型。

    SYNTHESIS-BASED PRE-SELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH
    9.
    发明申请
    SYNTHESIS-BASED PRE-SELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH 有权
    合成语音适合单位的合成预选

    公开(公告)号:US20060100878A1

    公开(公告)日:2006-05-11

    申请号:US11275432

    申请日:2005-12-30

    申请人: Alistair Conkie

    发明人: Alistair Conkie

    IPC分类号: G10L13/08

    CPC分类号: G10L13/07

    摘要: A system and computer-readable medium are disclosed that synthesize speech from text using a triphone unit selection database. The instructions on the computer-readable medium control a computing device to perform the steps: receiving input text, selecting a plurality of N phoneme units from the triphone unit selection database as candidate phonemes for synthesized speech based on the input text, applying a cost process to select a set of phonemes from the candidate phonemes and synthesizing speech using the selected set of phonemes.

    摘要翻译: 公开了一种使用三电话单元选择数据库从文本合成语音的系统和计算机可读介质。 计算机可读介质上的指令控制计算设备执行以下步骤:接收输入文本,从三音单元选择数据库中选择多个N个音素单元作为基于输入文本的合成语音的候选音素,应用成本处理 从候选音素中选择一组音素,并使用所选择的一组音素合成语音。

    Method and apparatus for combining text to speech and recorded prompts
    10.
    发明授权
    Method and apparatus for combining text to speech and recorded prompts 有权
    用于组合文本到语音和记录的提示的方法和装置

    公开(公告)号:US08600753B1

    公开(公告)日:2013-12-03

    申请号:US11321638

    申请日:2005-12-30

    申请人: Alistair Conkie

    发明人: Alistair Conkie

    IPC分类号: G10L13/00 G10L13/08

    CPC分类号: G10L13/00 G10L13/08 G10L13/10

    摘要: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.

    摘要翻译: 一种安排提供了从消息文本产生的语音的改进综合。 这些安排为这些提示存储预录的提示和言语相关特征。 解析消息以确定是否先前已经记录了消息部分。 如果是,则检索那些部分的语音相关特征。 该安排为以前不存储的那些方生成语音相关特征。 检索和生成的特征相结合。 然后将特性的组合用作语音合成器的输入。