Abstract:
A Method for processing an audio signal to use a beamforming technique in a three-dimensional (3D) space is disclosed. The method may include generating a beamforming signal on a horizontal plane related to a sound source in a three-dimensional (3D) space and modulating the beamforming signal to head for a listener in the 3D space from the sound source.
Abstract:
An apparatus and method for generating a multichannel audio signal showing a 3-dimensional (3D) effect by editing an existing multichannel audio signal. The apparatus may include an output location determiner to determine location to output an object audio signal using editing information of the object audio signal, and a multichannel object audio signal generator to generate a multichannel object audio signal by allocating the object audio signal to channels corresponding to the location to output the object audio signal.
Abstract:
The present invention relates to a method and apparatus for encoding a displacement video using image tiling. A method for encoding multi-dimensional data according to an embodiment of the present disclosure may comprise: converting the multi-dimensional data into one or more frames with two-dimensional characteristics; generating one or more frame groups by grouping the one or more frames with pre-configured number units; reconstructing frames belonging to each frame group into a tiled frame; and generating a bitstream by encoding the tiled frame. Here, the tiled frame may be constructed with one or more blocks, and each block may be constructed by rearranging pixels existing at the same location in the frames.
Abstract:
A method of processing an immersive video according to the present disclosure includes performing pruning for an input image, generating an atlas based on patches generated by the pruning and generating a cropped atlas by removing a background region of the atlas.
Abstract:
The present invention provides a method for image preprocessing, the method including: identifying a plurality of input images to process image registration; selecting at least one transformation method among histogram equalization and gamma transformation in consideration of hue and brightness values of the plurality of input images; processing the histogram equalization on the plurality of input images in response to the histogram equalization method being selected; and processing the gamma transformation on the plurality of input images, in response to the gamma transformation method being selected.
Abstract:
Disclosed herein is an image deep learning model training method. The method includes sampling a twin negative comprising a first negative sample and a second negative sample by selecting the first negative sample with a highest similarity out of an anchor sample and a positive sample constituting a matching pair in each class and by selecting the second negative sample with a highest similarity to the first negative sample, and training the samples to minimize a loss of a loss function in each class by utilizing the anchor sample, the positive sample, the first and second negative samples for each class. The first negative sample is selected in a different class from a class comprising the matching pair, and the second negative sample is selected in a different class from classes comprising the matching pair and the first negative sample.
Abstract:
Disclosed are a multi-channel audio signal processing method and a multi-channel audio signal processing apparatus. The multi-channel audio signal processing method may generate N channel output signals from N/2 channel downmix signals based on an N−N/2−N structure.
Abstract:
Disclosed is a method and apparatus for training a speech signal. A speech signal training apparatus of the present disclosure may include a target speaker speech database storing a target speaker speech signal; a multi-speaker speech database storing a multi-speaker speech signal; a target speaker acoustic parameter extracting unit extracting an acoustic parameter of a training subject speech signal from the target speaker speech signal; a similar speaker acoustic parameter determining unit extracting at least one similar speaker speech signal from the multi-speaker speech signals, and determining an auxiliary speech feature of the similar speaker speech signal; and an acoustic parameter model training unit determining an acoustic parameter model by performing model training for a relation between the acoustic parameter and text by using the acoustic parameter and the auxiliary speech feature, and setting mapping information of the relation between the acoustic parameter model and the text.
Abstract:
Provided are an encoding method of a multichannel signal, an encoding apparatus to perform the encoding method, a multichannel signal processing method, and a decoding apparatus to perform the decoding method. The decoding method may include identifying an N/2-channel downmix signal derived from an N-channel input signal; and generating an N-channel output signal from the identified N/2-channel downmix signal using a plurality of one-to-two (OTT) boxes. If a low frequency effect (LFE) channel is absent in the output signal, the number of OTT boxes may be equal to N/2 where N/2 denotes the number of channels of the downmix signal.
Abstract:
An encoder and an encoding method for a multi-channel signal, and a decoder and a decoding method for a multi-channel signal are disclosed. A multi-channel signal may be efficiently processed by consecutive downmixing or upmixing.