SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS
    1.
    发明申请
    SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS 有权
    用于自动语音识别的声学模型的系统和方法,用于识别前后职业

    公开(公告)号:US20090112594A1

    公开(公告)日:2009-04-30

    申请号:US11930675

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    摘要翻译: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。

    PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS
    2.
    发明申请
    PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS 审中-公开
    在单元选择语音合成中的电话强化标签

    公开(公告)号:US20080077407A1

    公开(公告)日:2008-03-27

    申请号:US11535146

    申请日:2006-09-26

    IPC分类号: G10L13/00

    CPC分类号: G10L13/06 G10L13/08

    摘要: A system, method and computer-readable media are disclosed for improving speech synthesis. A text-to-speech (TTS) voice database for use in a TTS system is generated by a method comprising labeling a voice database phonemically and applying a pre-/post-vocalic distinction to the phonemic labels to generate a TTS voice database. When a system synthesizes speech using speech units from the TTS voice database, the database provides phonemes for selection using the pre-/post-vocalic distinctions which improve unit selection to render the synthetic speech more natural.

    摘要翻译: 公开了用于改进语音合成的系统,方法和计算机可读介质。 用于TTS系统的文本到语音(TTS)语音数据库通过一种方法产生,该方法包括以语音的方式标注语音数据库,并且将语音前/后的区别应用于音素标签以产生TTS语音数据库。 当系统使用来自TTS语音数据库的语音单元来合成语音时,数据库使用前/后声部区分提供用于选择的音素,这改进了单元选择以使合成语音更自然。

    Method and system for enhancing a speech database

    公开(公告)号:US08510113B1

    公开(公告)日:2013-08-13

    申请号:US11469134

    申请日:2006-08-31

    摘要: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

    System and method of word lattice augmentation using a pre/post vocalic consonant distinction
    4.
    发明授权
    System and method of word lattice augmentation using a pre/post vocalic consonant distinction 有权
    使用前/后声乐辅音区分的词格增强的系统和方法

    公开(公告)号:US08024191B2

    公开(公告)日:2011-09-20

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

    摘要翻译: 提供了系统和方法来识别语音对话系统中的语音。 该方法包括接收具有声前辅音或声后辅音的输入语音,通过将输入的语音与训练模型进行比较来产生至少一个输出格数,该输出格式通过比较输入语音来提供结果并区分前语音 辅音和语音后辅音。 通过测量输入语音中的声前辅音或声音后辅音与第一分数之间的相似度来计算第二分。 通过使用第二分数来确定至少一个类别,用于通过使用第二分数进行语前匹配或不匹配或者后声匹配或不匹配,并且通过使用至少一个类别对自动语音识别(ASR)系统的结果进行改进, 前声匹配或不匹配或后声匹配或不匹配。

    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION
    5.
    发明申请
    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION 有权
    使用前任/后期职业协商决定的字幕扩展的系统和方法

    公开(公告)号:US20090112591A1

    公开(公告)日:2009-04-30

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems and methods for recognizing speech in a spoken dialogue system. The method includes (1) receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant, (2) generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result; (3) distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech, (4) calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score, (5) determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score, and (6) refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

    摘要翻译: 公开了用于在口头对话系统中识别语音的系统和方法。 该方法包括(1)接收具有至少一个声前辅音或至少一个声后辅音的输入语音,(2)通过将输入的语音与训练模型进行比较来产生计算第一分数的至少一个输出格 提供结果; (3)在所述输入语音中区分所述至少一个声前辅音和所述至少一个声后辅音,(4)通过测量所述至少一个声前辅音或所述至少一个声前辅音之间的相似度来计算第二分数 输入语音和第一分数中的至少一个声音辅音,(5)通过使用第二分数来确定至少一个人声前匹配或不匹配或至少一个后声匹配或不匹配的至少一个类别,以及( 6)通过使用至少一个类别进行至少一个声前匹配或不匹配或至少一个后声匹配或不匹配,来改进自动语音识别(ASR)系统的结果。

    System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants
    6.
    发明授权
    System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants 有权
    用于自动语音识别的声学模型的系统和方法,其区分声前和后声辅音

    公开(公告)号:US08015008B2

    公开(公告)日:2011-09-06

    申请号:US11930675

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

    摘要翻译: 公开了用于训练用于自动语音识别系统(ASR)系统的声学模型的系统,方法和计算机可读介质。 该方法包括基于所述至少一个音节边界位置接收定义接收到的语音信号中的至少一个音节边界位置的语音信号,在辅音音素库中为每个辅音生成声前位置标签和后声音位置标签, 声音位置标签,以扩展辅音音素库存,重新设计词典,以反映扩展的辅音音素库存,并为基于重新设计的词典的自动语音识别(ASR)系统培训语言模型。

    System and method for pronunciation modeling
    7.
    发明授权
    System and method for pronunciation modeling 有权
    发音建模的系统和方法

    公开(公告)号:US08862470B2

    公开(公告)日:2014-10-14

    申请号:US13302380

    申请日:2011-11-22

    IPC分类号: G10L15/187 G10L15/183

    摘要: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    摘要翻译: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    SYSTEM AND METHOD FOR SYNTHETIC VOICE GENERATION AND MODIFICATION
    8.
    发明申请
    SYSTEM AND METHOD FOR SYNTHETIC VOICE GENERATION AND MODIFICATION 有权
    用于合成语音生成和修改的系统和方法

    公开(公告)号:US20120035933A1

    公开(公告)日:2012-02-09

    申请号:US12852164

    申请日:2010-08-06

    IPC分类号: G10L13/00

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

    摘要翻译: 这里公开了用于产生合成语音的系统,方法和非暂时的计算机可读存储介质。 被配置为实施该方法的系统组合第一文本到语音语音的第一数据库和第二文本到语音语音的第二数据库以生成组合数据库,基于策略从组合数据库中进行选择, 用于合成语音的语音类别的语音单元以产生所选择的语音单元,并且基于所选择的语音单元来合成语音。 该系统可以合成语音,而无需参数化第一个文本到语音的语音和第二个文本到语音的语音。 对于特定语音类别,策略可以定义哪些文本到语音语音来选择语音单元。 组合的数据库可以包括来自不同扬声器的多个文本到语音的声音。 组合的数据库可以包括以不同风格说话的单个扬声器的声音。 组合的数据库可以包括不同语言的语音。

    SYSTEM AND METHOD FOR PRONUNCIATION MODELING
    9.
    发明申请
    SYSTEM AND METHOD FOR PRONUNCIATION MODELING 有权
    发明建模系统与方法

    公开(公告)号:US20100145707A1

    公开(公告)日:2010-06-10

    申请号:US12328407

    申请日:2008-12-04

    IPC分类号: G10L13/06

    摘要: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    摘要翻译: 本文公开了用于生成发音模型的系统,计算机实现的方法和有形的计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    Method and system for recorded word concatenation
    10.
    发明授权
    Method and system for recorded word concatenation 有权
    记录字连接的方法和系统

    公开(公告)号:US06601030B2

    公开(公告)日:2003-07-29

    申请号:US09198105

    申请日:1998-11-23

    申请人: Ann K. Syrdal

    发明人: Ann K. Syrdal

    IPC分类号: G10L1300

    CPC分类号: G10L13/08

    摘要: A method and system are provided for performing recorded word concatenation to create a natural sounding sequence of words, numbers, phrases, sounds, etc. for example. The method and system may include a tonal pattern identification unit that identifies tonal patterns, such as pitch accents, phrase accents and boundary tones, for utterances in a particular domain, such as telephone numbers, credit card numbers, the spelling of words, etc.; a script designer that designs a script for recording a string of words, numbers, sounds etc., based on an appropriate rhythm and pitch range in order to obtain natural prosody for utterances in the particular domain and with minimum coarticulation between concatenative units; a script recorder that records a speaker's utterances of the domain strings; a recording editor that edits the recorded strings by marking the beginning and end of each word, number etc. in the string and including or inserting pauses according to the tonal patterns; and a concatenation unit that concatenates the edited recording into a smooth and natural sounding string of words, numbers, letters of the alphabet, etc., for audio output.

    摘要翻译: 提供了一种方法和系统,用于执行记录的字串连,以产生例如单词,数字,短语,声音等的自然的声音序列。 方法和系统可以包括音调模式识别单元,其识别用于特定领域中的话语的音调模式,例如音高重音,短语重音和边界音调,诸如电话号码,信用卡号码,字的拼写等。 ; 一个脚本设计师,设计一个基于适当的节奏和音高范围记录字符串,数字,声音等的脚本,以获得特定领域的话语的自然韵律,并以串联单元的最小化; 一个脚本记录器,用于记录说话者的字串串话; 记录编辑器,通过标记字符串中的每个单词,数字等的开始和结尾来编辑记录的字符串,并根据色调模式包括或插入暂停; 以及连接单元,其将编辑的记录连接成用于音频输出的单词,数字,字母表的字母等的平滑和自然的声音串。