摘要:
A method of setting an optimum-partitioned classified neural network and a method and apparatus for automatic labeling using an optimum-partitioned classified neural network are provided. The method of automatic labeling using an optimum-partitioned classified neural network comprises (a) searching for neural networks having minimum errors with respect to a number of L phoneme combinations from a number of K neural network combinations generated at an initial stage or updated, updating weights during learning of the K neural networks by K phoneme combination groups searched with the same neural networks, and composing an optimum-partitioned classified neural network combination using the K neural networks of which a total error sum has converged; and (b) tuning a phoneme boundary of a first label file by using the phoneme combination group classification result and the optimum-partitioned classified neural network combination, and generating a final label file reflecting the tuning result.
摘要:
There are provided a system and method for providing information using a spoken dialogue interface. The system includes a speech recognizer for transforming voice signals into sentences; a sentence analyzer for analyzing the sentences by their structural elements; a dialogue manager for extracting information on speech acts or intentions from the structural elements, and generating information on system's speech acts or intentions for a response to the extracted information on speech acts or intentions; a sentence generator for generating sentences based on the information on the system's speech acts or intentions for the response; a speech synthesizer for synthesizing the generated sentences into voices; an information extractor for extracting information required for the response from the Internet in real time; and a user modeling means for analyzing and classifying users' tendencies. Information demanded by a user can be detected in real time and provided through a voice interface with versatile and familiar dialogues based on the user's tendencies.
摘要:
An audio apparatus and a method of converting an audio signal are provided. The method includes: receiving a first audio signal including a plurality of channels (S810); comparing audio signals of the plurality of channels to estimate a source position of the first audio signal (S830); localizing a source of the first audio signal toward a three-dimensional (3D) position having an elevation component based on the estimated source position (S840); converting the first audio signal into a second audio signal including the plurality of channels and at least one channel having, based on the localized source, a different elevation from the plurality of channels (S850); and outputting the second audio signal (S860).
摘要:
A virtual screen sound source is spatially synchronized with a visual object displayed on a display. A plurality of loudspeaker sets, which each include at least three of a plurality of loudspeakers installed at the periphery of a display, are selected, individual sound sources corresponding to the respective selected loudspeaker sets are generated, and a multi-sound source is generated by overlapping the generated individual sound sources and output through loudspeakers included in the loudspeaker sets.
摘要:
An audio apparatus and a method of converting an audio signal are provided. The method includes: receiving a first audio signal including a plurality of channels (S810); comparing audio signals of the plurality of channels to estimate a source position of the first audio signal (S830); localizing a source of the first audio signal toward a three-dimensional (3D) position having an elevation component based on the estimated source position (S840); converting the first audio signal into a second audio signal including the plurality of channels and at least one channel having, based on the localized source, a different elevation from the plurality of channels (S850); and outputting the second audio signal (S860).
摘要:
Disclosed is a speech synthesis system and method using a smoothing filter. A speech synthesis system for controlling a discontinuous distortion occurred at the transition portion between concatenated phonemes which are speech units of a synthesized speech using a smoothing technique, comprising: a discontinuous distortion processing means adapted to predict a discontinuity occurred at the transition portion between concatenated samples of phonemes used for a speech synthesis through a predetermined learning process, and control a discontinuity occurred at the transition portion between the concatenated phonemes of the synthesized speech in such a fashion that it is smoothed adaptively to correspond to a degree of the predicted discontinuity. The smoothing filter smoothes the synthesized speech so that the discontinuity degree of synthesized speech follows the predicted discontinuity degree according to the filter coefficient (a) changed adaptively to correspond to a ratio of the predicted discontinuity degree to the real discontinuity degree. That is, since a discontinuity occurred at a transition portion between concatenated phonemes of the synthesized speech (IN) is adaptively smoothed to follow that occurred in the actually spoken sound, the synthesized speech (IN) can be approximated more closely to a real human voice.