Abstract:
To synthesize speech, which is clear and high in naturalness, in a Japanese-language speech synthesis system by improving not only phoneme information but also rhythm information. In the Japanese-language, the independent word speech and the adjunct word speech are remarkably different in speech characteristic. The difference in speech characteristics between them is clearly observed, particularly in rhythmical elements such as the intensity, speech, and pitch of speech. From this fact, there is provided a new rule synthesis method which uses as a speech synthesis unit an adjunct word chain unit comprising a chain of one or more adjunct words and which is capable of synthesizing speech whose naturalness is high. The portion other than the adjunct word portion, i.e., the independent word portion, is constituted in a CV/VC unit.
Abstract:
Analysis of a word input from a speech input device 1 for its features is made by a feature extractor 4 to obtain a feature vector sequence corresponding to said word, or to obtain a label sequence by applying a further transformation in a labeler 8. Fenonic hidden Markov models for speech transformation candidates are combined with N-gram probabilities (where N is all integer greater than or equal to 2) to produce models of words. The recognizer determines the probability that the speech model composed for each candidate word would output the label sequence or feature vector sequence input as speech, and outputs the candidate word corresponding to the speech model having the highest probability to a display 19.