Unnatural prosody detection in speech synthesis
    1.
    发明申请
    Unnatural prosody detection in speech synthesis 有权
    语言合成中的非自然韵律检测

    公开(公告)号:US20090083036A1

    公开(公告)日:2009-03-26

    申请号:US11903020

    申请日:2007-09-20

    CPC classification number: G10L13/10

    Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.

    Abstract translation: 描述了一种技术,通过该技术,从文本产生的合成语音针对韵律模型(离线训练)进行评估,以确定语音是否会听起来不自然。 如果是,则使用修改的数据重新生成语音。 评估和再生可能是迭代的,直到被认为是自然的声音。 例如,文本被内置到一个格子中,然后(例如,维特比)被搜索以找到最佳路径。 通过韵律模型评估路径上的数据的部分(例如,单位)。 如果评估认为一部分对应于非自然韵律,则该部分被替换,例如通过修改/修剪格子并重新执行搜索。 替换可能是迭代的,直到所有部分通过评估。 不自然的韵律检测可能有偏差,使得在评估期间,相对于错过非自然韵律的速率,以较高的速率错误地检测到非自然韵律。

    Unnatural prosody detection in speech synthesis

    公开(公告)号:US08583438B2

    公开(公告)日:2013-11-12

    申请号:US11903020

    申请日:2007-09-20

    CPC classification number: G10L13/10

    Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.

    Refining of segmental boundaries in speech waveforms using contextual-dependent models
    3.
    发明申请
    Refining of segmental boundaries in speech waveforms using contextual-dependent models 失效
    使用上下文相关模型对语音波形中的分段边界进行精细化

    公开(公告)号:US20050228664A1

    公开(公告)日:2005-10-13

    申请号:US10823129

    申请日:2004-04-13

    CPC classification number: G10L15/02 G10L2015/022

    Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.

    Abstract translation: 提供了一种用于在语音波形中精细化分段边界的方法和装置。 上下文声学特征相似性被用作聚类相邻音素语音单元的基础,其中每个相邻对的音素语音单元包括节段边界。 针对每个群集训练一个细化模型,并用于精化形成群集的上下文音素语音单元的边界。

    Optimization of an objective measure for estimating mean opinion score of synthesized speech
    4.
    发明申请
    Optimization of an objective measure for estimating mean opinion score of synthesized speech 失效
    优化综合语音平均意见得分的客观量度

    公开(公告)号:US20050060155A1

    公开(公告)日:2005-03-17

    申请号:US10660388

    申请日:2003-09-11

    CPC classification number: G10L25/69 G10L13/00

    Abstract: A method is provided for optimizing an objective measure used to estimate mean opinion score or naturalness of synthesized speech from a speech synthesizer. The method includes using an objective measure that has components derived directly from textual information used to form synthesized utterances. The objective measure has a high correlation with mean opinion score such that a relationship can be formed between the objective measure and corresponding mean opinion score. The objective measure is altered to provide a different function of textual information derived from the utterances so as to improve the relationship between the scores of the objective measure and subjective ratings of the synthesized utterances.

    Abstract translation: 提供了一种用于优化用于估计来自语音合成器的合成语音的平均意见分数或自然度的客观测量的方法。 该方法包括使用具有直接从用于形成合成话语的文本信息导出的成分的客观度量。 客观量度与平均意见分数具有很高的相关性,从而可以在客观量度和相应的平均意见得分之间形成关系。 改变客观量度以提供从话语中得出的文本信息的不同功能,以改善客观测量的分数与合成话语的主观评级之间的关系。

    Defining atom units between phone and syllable for TTS systems
    5.
    发明授权
    Defining atom units between phone and syllable for TTS systems 有权
    为TTS系统定义手机和音节之间的原子单位

    公开(公告)号:US07418389B2

    公开(公告)日:2008-08-26

    申请号:US11033075

    申请日:2005-01-11

    Applicant: Min Chu Yong Zhao

    Inventor: Min Chu Yong Zhao

    CPC classification number: G10L13/08

    Abstract: A method for identifying common multiphone units to add to a unit inventory for a text-to-speech generator is disclosed. The common multiphone units are units that are larger than a phone, but smaller than a syllable. The method slices each syllable into a plurality of slices. These slices are then sorted and the frequency of each slice is determined. Those slices whose frequencies exceed a threshold are added to the unit inventory. The remaining slices are decomposed according to a predetermined set of rules to determine if they contain slices that should be added to the unit inventory.

    Abstract translation: 公开了一种用于识别用于添加到文本到语音生成器的单元库存的公共多声单元的方法。 普通的多声道单元是比手机大的单位,但小于音节。 该方法将每个音节分成多个切片。 然后对这些切片进行排序,并确定每个切片的频率。 频率超过阈值的那些切片被添加到单位库存中。 剩余的切片根据预定的一组规则分解,以确定它们是否包含应该添加到单元库存的切片。

    Providing personalized voice font for text-to-speech applications
    6.
    发明授权
    Providing personalized voice font for text-to-speech applications 失效
    为文字到语音应用程序提供个性化的语音字体

    公开(公告)号:US07693719B2

    公开(公告)日:2010-04-06

    申请号:US10977178

    申请日:2004-10-29

    CPC classification number: G10L13/033 G10L2021/0135

    Abstract: A method for synthesizing speech from text includes receiving one or more waveforms characteristic of a voice of a person selected by a user, generating a personalized voice font based on the one or more waveforms, and delivering the personalized voice font to the user's computer, whereby speech can be synthesized from text, the speech being in the voice of the selected person, the speech being synthesized using the personalized voice font. A system includes a text-to-speech (TTS) application operable to generate a voice font based on speech waveforms transmitted from a client computer remotely accessing the TTS application.

    Abstract translation: 一种用于从文本合成语音的方法包括接收用户选择的人物的声音特征的一个或多个波形,基于一个或多个波形产生个性化语音字体,并将个性化语音字体传送到用户的计算机,由此 可以从文本合成语音,语音在所选择的人的语音中,使用个性化语音字体合成语音。 一种系统包括文本到语音(TTS)应用,其可操作以基于远程访问TTS应用的客户端计算机发送的语音波形来生成语音字体。

    Refining of segmental boundaries in speech waveforms using contextual-dependent models
    7.
    发明授权
    Refining of segmental boundaries in speech waveforms using contextual-dependent models 失效
    使用上下文相关模型对语音波形中的分段边界进行精细化

    公开(公告)号:US07496512B2

    公开(公告)日:2009-02-24

    申请号:US10823129

    申请日:2004-04-13

    CPC classification number: G10L15/02 G10L2015/022

    Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.

    Abstract translation: 提供了一种用于在语音波形中精细化分段边界的方法和装置。 上下文声学特征相似性被用作聚类相邻音素语音单元的基础,其中每个相邻对的音素语音单元包括节段边界。 针对每个群集训练一个细化模型,并用于精化形成群集的上下文音素语音单元的边界。

    Optimization of an objective measure for estimating mean opinion score of synthesized speech
    8.
    发明授权
    Optimization of an objective measure for estimating mean opinion score of synthesized speech 失效
    优化综合语音平均意见得分的客观量度

    公开(公告)号:US07386451B2

    公开(公告)日:2008-06-10

    申请号:US10660388

    申请日:2003-09-11

    CPC classification number: G10L25/69 G10L13/00

    Abstract: A method is provided for optimizing an objective measure used to estimate mean opinion score or naturalness of synthesized speech from a speech synthesizer. The method includes using an objective measure that has components derived directly from textual information used to form synthesized utterances. The objective measure has a high correlation with mean opinion score such that a relationship can be formed between the objective measure and corresponding mean opinion score. The objective measure is altered to provide a different function of textual information derived from the utterances so as to improve the relationship between the scores of the objective measure and subjective ratings of the synthesized utterances.

    Abstract translation: 提供了一种用于优化用于估计来自语音合成器的合成语音的平均意见分数或自然度的客观测量的方法。 该方法包括使用具有直接从用于形成合成话语的文本信息导出的成分的客观度量。 客观量度与平均意见分数具有很高的相关性,从而可以在客观量度和相应的平均意见得分之间形成关系。 改变客观量度以提供从话语中得出的文本信息的不同功能,以改善客观测量的分数与合成话语的主观评级之间的关系。

    Speech unit selection using HMM acoustic models
    9.
    发明申请
    Speech unit selection using HMM acoustic models 审中-公开
    使用HMM声学模型进行语音单元选择

    公开(公告)号:US20080059190A1

    公开(公告)日:2008-03-06

    申请号:US11508093

    申请日:2006-08-22

    CPC classification number: G10L13/06

    Abstract: A concatenating speech synthesizer concatenates selected speech units to obtain the desired synthesized speech. When desired speech units of phonetic and/or prosodic context are not available, the synthesizer selects replacement speech units based on measures representative of the difference between the HMM acoustic models of the desired speech unit and available speech units.

    Abstract translation: 级联语音合成器连接所选择的语音单元以获得期望的合成语音。 当需要语音和/或韵律上下文的语音单元不可用时,合成器基于表示期望语音单元的HMM声学模型和可用语音单元之间的差异的度量来选择替换语音单元。

    Front-end architecture for a multi-lingual text-to-speech system
    10.
    发明授权
    Front-end architecture for a multi-lingual text-to-speech system 失效
    多语言文字到语音系统的前端架构

    公开(公告)号:US07496498B2

    公开(公告)日:2009-02-24

    申请号:US10396944

    申请日:2003-03-24

    CPC classification number: G10L13/08

    Abstract: A text processing system for processing multi-lingual text for a speech synthesizer includes a first language dependent module for performing at least one of text and prosody analysis on a portion of input text comprising a first language. A second language dependent module performs at least one of text and prosody analysis on a second portion of input text comprising a second language. A third module is adapted to receive outputs from the first and second dependent module and performs prosodic and phonetic context abstraction over the outputs based on multi-lingual text.

    Abstract translation: 用于处理语音合成器的多语言文本的文本处理系统包括第一语言相关模块,用于对包括第一语言的输入文本的一部分执行文本和韵律分析中的至少一个。 第二语言相关模块在包括第二语言的输入文本的第二部分上执行文本和韵律分析中的至少一个。 第三模块适于接收来自第一和第二从属模块的输出,并且基于多语言文本在输出上执行韵律和语音上下文抽象。

Patent Agency Ranking