摘要:
An improved method for shifting the pitches of a tone is disclosed. It comprises: (a) subjecting a digitized original waveform to a whitening process using an all-zero filter (AZF) to obtain a whitened waveform; (b) resampling the whitened waveform at a desired scaling ratio to obtain a scaled and whitened waveform; (c) subjecting the scaled and whitened waveform to a coloring process using an all-pole filter (APF) to obtain a synthesized waveform. In a preferred embodiment, the all-zero filter performs the transformation function of: ##EQU1## and the all-pole filter performs the transformation function of: ##EQU2## wherein the a.sub.i 's and b.sub.i 's are linear predictive coefficients. The whitened waveforms can be compressed and stored as wavetables, which can be subsequently retrieved and decompressed before resampling.
摘要:
A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.
摘要:
A speech synthesizer generating system and a method thereof are provided. A speech synthesizer generator in the speech synthesizer generating system automatically generates a speech synthesizer conforming to a speech output specification input by a user. In addition, a recording script is automatically generated by a recording script generator in the speech synthesizer generating system according to the speech output specification, and a customized or expanded speech material is recorded according to the recording script. After the speech material is uploaded to the speech synthesizer generating system, the speech synthesizer generator automatically generates a speech synthesizer conforming to the speech output specification. The speech synthesizer then synthesizes and outputs a speech output at a user end.
摘要:
A method for speech quality degradation estimation, a method for degradation measures calculation, and the apparatuses thereof are provided. The first method above estimates the speech quality of a speech signal that is modified by a pitch-synchronous prosody modification method, which comprises the following steps. First, extract at least one source pitchmark from the speech signal, and then maps the source pitchmark(s) to at least one target pitchmark(s). Finally, calculate at least one degradation measure based on the mapping between the source and the target pitchmarks. The degradation measures include several weighted pitch-related functions and duration-related functions, where the weighting functions can be calculated based on the speech signal or the pitchmark(s) mapping mentioned above.
摘要:
This proposal presents performance indices and search criteria for the text script generation in the design of corpus-based TTS systems. Based on our criteria a new search method is presented to solve the text selection problem more systematically and efficiently, unlike previous researches either concentrated on covering rate or on hit rate. By control a weighting factor, the covering rate of unit types can be increased to improve the robustness of the TTS system. Finally, the scalable and controllable design of the multi-stage search can produce various kinds of text scripts ideally suitable for the requirement of various kinds of corpus-based TTS systems.
摘要:
A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure is disclosed. This method is based on comparison of speech segments segmented from a speech corpus, wherein speech segments are fully prosody-aligned to each other before distortion measure. With prosody alignment embedded in selection process, distortion resulting from possible prosody modification in synthesis could be taken into account objectively in selection phase. In order to carry out the purpose of the present invention, automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distortion measures, MFCC and PSQM are used for comparing two prosody-aligned segments of speech because of human perceptual consideration.