摘要:
A method is presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. In one embodiment, fundamental frequency values are used to form the excitation signal. The excitation is modeled using a voice source pulse selected from a database of a given speaker. The voice source signal is segmented into glottal segments, which are used in vector representation to identify the glottal pulse used for formation of the excitation signal. Use of a novel distance metric and preserving the original signals extracted from the speakers voice samples helps capture low frequency information of the excitation signal. In addition, segment edge artifacts are removed by applying a unique segment joining method to improve the quality of synthetic speech while creating a true representation of the voice quality of a speaker.
摘要:
A method is presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. In one embodiment, fundamental frequency values are used to form the excitation signal. The excitation is modeled using a voice source pulse selected from a database of a given speaker. The voice source signal is segmented into glottal segments, which are used in vector representation to identify the glottal pulse used for formation of the excitation signal. Use of a novel distance metric and preserving the original signals extracted from the speakers voice samples helps capture low frequency information of the excitation signal. In addition, segment edge artifacts are removed by applying a unique segment joining method to improve the quality of synthetic speech while creating a true representation of the voice quality of a speaker.