Abstract:
Provided is an emotional-speech synthesizing device including: a sentence recognition unit that recognizes a sentence that is input; a word emotion determination unit that calculates probability vector of an emotion that is pre-defined for each word that makes up the recognized sentence and estimates the emotion and a rhythm based on the probability vector; and an emotional-speech synthesizing unit. The emotional-speech synthesizing unit calculates in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence, applies weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector, selects the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, and thus synthesizes an emotional speech that corresponds to the recognized sentence in optimal units.