Abstract:
Disclosed herein are a multimodal unsupervised meta-learning method and apparatus. The multimodal unsupervised meta-learning method includes training, by a multimodal unsupervised feature representation learning unit, an encoder configured to extract features of individual single-modal signals from a source multimodal dataset, generating, by a multimodal unsupervised task generation unit, a source task based on the features of individual single-modal signals, deriving, by a multimodal unsupervised learning method derivation unit, a learning method from the source task using the encoder, and training, by a target task performance unit, a model based on the learning method and features extracted from a small number of target datasets by the encoder, thus performing the target task.
Abstract:
There is provided a method of determining a main speaker that is performed by a first terminal participating in a distributed telepresence service. The method of determining a main speaker according to an embodiment of the invention includes obtaining first feature information for determining a main speaker from an audio input signal, obtaining second feature information for determining a main speaker of a second terminal from the second terminal participating in the distributed telepresence service, and determining a main speaker terminal for providing a video and an audio of a main speaker who is participating in a telepresence and is speaking based on the first feature information for determining a main speaker and the second feature information for determining a main speaker.
Abstract:
Provided are a method and an apparatus for encoding and decoding an audio signal. A method for encoding an audio signal includes receiving a transformed audio signal, dividing the transformed audio signal into a plurality of subbands, performing a first sinusoidal pulse coding operation on the subbands, determining a performance region of a second sinusoidal pulse coding operation among the subbands on the basis of coding information of the first sinusoidal pulse coding operation, and performing the second sinusoidal pulse coding operation on the determined performance region, wherein the first sinusoidal pulse coding operation is performed variably according to the coding information. Accordingly, it is possible to further improve the quality of a synthesized signal by considering the sinusoidal pulse coding of a lower layer when encoding or decoding an audio signal in an upper layer by a layered sinusoidal pulse coding scheme.
Abstract:
Disclosed are an apparatus for coding a signal in a communication system including: a coding unit configured to code voice and audio signals based on a code excited linear prediction (CELP) coding method; a residual signal calculation unit configured to calculate residual signals of the voice and audio signals; a frequency transform unit configured to transform the residual signal into a signal in a frequency domain; an energy calculation unit configured to use frequency coefficients of the residual signals to calculate frequency energy of the residual signals; an energy concentration calculation unit configured to calculate energy concentrations of each vector dimension of the residual signals from the frequency energy of the residual signals; and a vector dimension determination unit configured to compare the energy concentrations of each vector dimension to determine targeted vector dimensions of the residual signals.