METHOD FOR FORMING THE EXCITATION SIGNAL FOR A GLOTTAL PULSE MODEL BASED PARAMETRIC SPEECH SYNTHESIS SYSTEM
    1.
    发明申请
    METHOD FOR FORMING THE EXCITATION SIGNAL FOR A GLOTTAL PULSE MODEL BASED PARAMETRIC SPEECH SYNTHESIS SYSTEM 有权
    用于形成基于脉冲模型的参考语音合成系统的激励信号的方法

    公开(公告)号:US20150348535A1

    公开(公告)日:2015-12-03

    申请号:US14288745

    申请日:2014-05-28

    IPC分类号: G10L13/027

    CPC分类号: G10L25/90 G10L13/02

    摘要: A method is presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. In one embodiment, fundamental frequency values are used to form the excitation signal. The excitation is modeled using a voice source pulse selected from a database of a given speaker. The voice source signal is segmented into glottal segments, which are used in vector representation to identify the glottal pulse used for formation of the excitation signal. Use of a novel distance metric and preserving the original signals extracted from the speakers voice samples helps capture low frequency information of the excitation signal. In addition, segment edge artifacts are removed by applying a unique segment joining method to improve the quality of synthetic speech while creating a true representation of the voice quality of a speaker.

    摘要翻译: 提出了一种用于形成基于声门脉冲模型的参数语音合成系统的激励信号的方法。 在一个实施例中,使用基频值来形成激励信号。 使用从给定扬声器的数据库中选择的语音源脉冲对激励进行建模。 语音源信号被分割成声门段,其用于矢量表示以识别用于形成激励信号的声门脉冲。 使用新颖的距离度量并保留从扬声器提取的原始信号语音样本有助于捕获激发信号的低频信息。 此外,通过应用独特的段连接方法来去除段边缘伪像,以提高合成语音的质量,同时创建说话者的语音质量的真实表示。

    Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system

    公开(公告)号:US10014007B2

    公开(公告)日:2018-07-03

    申请号:US14288745

    申请日:2014-05-28

    IPC分类号: G10L25/90 G10L13/02

    CPC分类号: G10L25/90 G10L13/02

    摘要: A method is presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. In one embodiment, fundamental frequency values are used to form the excitation signal. The excitation is modeled using a voice source pulse selected from a database of a given speaker. The voice source signal is segmented into glottal segments, which are used in vector representation to identify the glottal pulse used for formation of the excitation signal. Use of a novel distance metric and preserving the original signals extracted from the speakers voice samples helps capture low frequency information of the excitation signal. In addition, segment edge artifacts are removed by applying a unique segment joining method to improve the quality of synthetic speech while creating a true representation of the voice quality of a speaker.