Rich context modeling for text-to-speech engines
    1.
    发明授权
    Rich context modeling for text-to-speech engines 有权
    文本到语音引擎的丰富的上下文建模

    公开(公告)号:US08340965B2

    公开(公告)日:2012-12-25

    申请号:US12629457

    申请日:2009-12-02

    IPC分类号: G10L13/00

    CPC分类号: G10L13/08

    摘要: Embodiments of rich context modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

    摘要翻译: 公开了用于语音合成的丰富语境建模的实施例。 在操作中,文本到语音引擎基于决策树绑定隐马尔可夫模型(HMM)来优化多个富集语境模型,以产生多个精炼的富语境模型。 然后,文本到语音引擎至少基于多个精炼富语境模型中的一些,为输入文本生成合成语音。

    SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE
    2.
    发明申请
    SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE 审中-公开
    小型文字到语音发动机

    公开(公告)号:US20110071835A1

    公开(公告)日:2011-03-24

    申请号:US12564326

    申请日:2009-09-22

    IPC分类号: G10L13/08 G10L15/14

    CPC分类号: G10L13/047 G10L13/08

    摘要: Embodiments of small footprint text-to-speech engine are disclosed. In operation, the small footprint text-to-speech engine generates a set of feature parameters for an input text. The set of feature parameters includes static feature parameters and delta feature parameters. The small footprint text-to-speech engine then derives a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta parameters. Finally, the small footprint text-to-speech engine produces a smoothed trajectory from the saw-tooth stochastic trajectory, and generates synthesized speech based on the smoothed trajectory.

    摘要翻译: 公开了小尺寸文字到语音引擎的实施例。 在操作中,小尺寸的文本到语音引擎为输入文本生成一组特征参数。 特征参数集包括静态特征参数和增量特征参数。 然后,小尺寸的文本到语音引擎基于静态特征参数和增量参数导出表示输入文本的语音特征的锯齿随机轨迹。 最后,小尺寸的文字到语音引擎从锯齿随机轨迹产生平滑的轨迹,并且基于平滑的轨迹产生合成语音。

    RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES
    3.
    发明申请
    RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES 有权
    用于文本到语音引擎的丰富的语境建模

    公开(公告)号:US20110054903A1

    公开(公告)日:2011-03-03

    申请号:US12629457

    申请日:2009-12-02

    IPC分类号: G10L13/08 G10L13/06 G10L13/00

    CPC分类号: G10L13/08

    摘要: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

    摘要翻译: 公开了用于语音合成的丰富文本建模的实施例。 在操作中,文本到语音引擎基于决策树绑定隐马尔可夫模型(HMM)来优化多个富集语境模型,以产生多个精炼的富语境模型。 然后,文本到语音引擎至少基于多个精炼富语境模型中的一些,为输入文本生成合成语音。

    NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION
    4.
    发明申请
    NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION 审中-公开
    用于连续语音识别的基于正则化的辨别训练

    公开(公告)号:US20130185070A1

    公开(公告)日:2013-07-18

    申请号:US13349529

    申请日:2012-01-12

    IPC分类号: G10L15/06

    CPC分类号: G10L15/063 G10L15/144

    摘要: A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech.

    摘要翻译: 语音识别系统使用基于不相关的可变性归一化的鉴别训练来训练多个特征变换和多个声学模型。 语音识别系统采用经过训练的特征变换来吸收或忽略与语音分类无关的未知语音中的变化。 然后,语音识别系统可以使用经过训练的识别模型识别未知语音。 语音识别系统可以进一步执行无监督的适配以适应未知语音的特征变换,从而提高识别未知语音的准确性。

    Trajectory Tiling Approach for Text-to-Speech
    5.
    发明申请
    Trajectory Tiling Approach for Text-to-Speech 审中-公开
    文字到语音的轨迹平铺方法

    公开(公告)号:US20120143611A1

    公开(公告)日:2012-06-07

    申请号:US12962543

    申请日:2010-12-07

    IPC分类号: G10L13/00

    摘要: Hidden Markov Models HMM trajectory tiling (HTT)-based approaches may be used to synthesize speech from text. In operation, a set of Hidden Markov Models (HMMs) and a set of waveform units may be obtained from a speech corpus. The set of HMMs are further refined via minimum generation error (MGE) training to generate a refined set of HMMs. Subsequently, a speech parameter trajectory may be generated by applying the refined set of HMMs to an input text. A unit lattice of candidate waveform units may be selected from the set of waveform units based at least on the speech parameter trajectory. A normalized cross-correlation (NCC)-based search on the unit lattice may be performed to obtain a minimal concatenation cost sequence of candidate waveform units, which are concatenated into a concatenated waveform sequence that is synthesized into speech.

    摘要翻译: 隐马尔科夫模型基于HMM轨迹平铺(HTT)的方法可用于从文本合成语音。 在操作中,可以从语音语料库获得一组隐马尔可夫模型(HMM)和一组波形单元。 通过最小生成错误(MGE)训练进一步改进了一组HMM,以生成精细的HMM集合。 随后,可以通过将精细的HMM集合应用于输入文本来生成语音参数轨迹。 可以至少基于语音参数轨迹从波形单元组中选择候选波形单元的单位格点。 可以执行在单位格子上的基于归一化互相关(NCC)的搜索以获得候选波形单元的最小级联成本序列,其被级联成合成为语音的级联波形序列。