摘要:
A method of setting an optimum-partitioned classified neural network and a method and apparatus for automatic labeling using an optimum-partitioned classified neural network are provided. The method of automatic labeling using an optimum-partitioned classified neural network comprises (a) searching for neural networks having minimum errors with respect to a number of L phoneme combinations from a number of K neural network combinations generated at an initial stage or updated, updating weights during learning of the K neural networks by K phoneme combination groups searched with the same neural networks, and composing an optimum-partitioned classified neural network combination using the K neural networks of which a total error sum has converged; and (b) tuning a phoneme boundary of a first label file by using the phoneme combination group classification result and the optimum-partitioned classified neural network combination, and generating a final label file reflecting the tuning result.
摘要:
Disclosed is a speech synthesis system and method using a smoothing filter. A speech synthesis system for controlling a discontinuous distortion occurred at the transition portion between concatenated phonemes which are speech units of a synthesized speech using a smoothing technique, comprising: a discontinuous distortion processing means adapted to predict a discontinuity occurred at the transition portion between concatenated samples of phonemes used for a speech synthesis through a predetermined learning process, and control a discontinuity occurred at the transition portion between the concatenated phonemes of the synthesized speech in such a fashion that it is smoothed adaptively to correspond to a degree of the predicted discontinuity. The smoothing filter smoothes the synthesized speech so that the discontinuity degree of synthesized speech follows the predicted discontinuity degree according to the filter coefficient (a) changed adaptively to correspond to a ratio of the predicted discontinuity degree to the real discontinuity degree. That is, since a discontinuity occurred at a transition portion between concatenated phonemes of the synthesized speech (IN) is adaptively smoothed to follow that occurred in the actually spoken sound, the synthesized speech (IN) can be approximated more closely to a real human voice.
摘要:
A method of setting an optimum-partitioned classified neural network and a method and apparatus for automatic labeling using an optimum-partitioned classified neural network are provided. The method of automatic labeling using an optimum-partitioned classified neural network comprises (a) searching for neural networks having minimum errors with respect to a number of L phoneme combinations from a number of K neural network combinations generated at an initial stage or updated, updating weights during learning of the K neural networks by K phoneme combination groups searched with the same neural networks, and composing an optimum-partitioned classified neural network combination using the K neural networks of which a total error sum has converged; and (b) tuning a phoneme boundary of a first label file by using the phoneme combination group classification result and the optimum-partitioned classified neural network combination, and generating a final label file reflecting the tuning result.
摘要:
A method of setting an optimum-partitioned classified neural network and a method and apparatus for automatic labeling using an optimum-partitioned classified neural network are provided. The method of automatic labeling using an optimum-partitioned classified neural network comprises (a) searching for neural networks having minimum errors with respect to a number of L phoneme combinations from a number of K neural network combinations generated at an initial stage or updated, updating weights during learning of the K neural networks by K phoneme combination groups searched with the same neural networks, and composing an optimum-partitioned classified neural network combination using the K neural networks of which a total error sum has converged; and (b) tuning a phoneme boundary of a first label file by using the phoneme combination group classification result and the optimum-partitioned classified neural network combination, and generating a final label file reflecting the tuning result.