Abstract:
A method and apparatus for processing an audio signal are disclosed. According to an example embodiment, a method of processing an audio signal may include acquiring a final audio signal for an initial audio signal using a plurality of neural network models generating output audio signals by encoding and decoding input audio signals, calculating a difference between the initial audio signal and the final audio signal in a time domain, converting the initial audio signal and the final audio signal into Mel-spectra, calculating a difference between the Mel-spectra of the initial audio signal and the final audio signal in a frequency domain, training the plurality of neural network models based on results calculated in the time domain and the frequency domain, and generating a new final audio signal distinguished from the final audio signal from the initial audio signal using the trained neural network models.
Abstract:
Disclosed are a method for coding a residual signal of LPC coefficients based on collaborative quantization and a computing device for performing the method. The residual signal coding method includes: generating encoded LPC coefficients and LPC residual signals by performing LPC analysis and quantization on an input speech; Determining a predicted LPC residual signal by applying the LPC residual signal to cross module residual learning; Performing LPC synthesis using the coded LPC coefficients and the predicted LPC residual signal; It may include the step of determining an output speech that is a synthesized output according to a result of performing the LPC synthesis.
Abstract:
Disclosed are a method of encoding a high band of an audio, a method of decoding a high band of an audio, and an encoder and a decoder for performing the methods. The method of decoding a high band of an audio, the method performed by a decoder, includes identifying a parameter extracted through a first neural network, identifying side information extracted through a second neural network, and restoring a high band of an audio by applying the parameter and the side information to a third neural network.
Abstract:
Provided is an apparatus and method for encoding/decoding audio based on a block. A method of encoding an audio signal may include dividing each of frame of input signal that constitute an audio signal into a plurality of subframes; transforming the subframes to a frequency domain; determining a two-dimensional (2D) intra block using the subframes transformed to the frequency domain; and encoding the 2D intra block. The 2D intra block may be a block that two-dimensionally displays frequency coefficients of the subframes transformed to the frequency domain using a time and a frequency.
Abstract:
Provided are a method and an apparatus for encoding and decoding an audio signal. A method for encoding an audio signal includes receiving a transformed audio signal, dividing the transformed audio signal into a plurality of subbands, performing a first sinusoidal pulse coding operation on the subbands, determining a performance region of a second sinusoidal pulse coding operation among the subbands on the basis of coding information of the first sinusoidal pulse coding operation, and performing the second sinusoidal pulse coding operation on the determined performance region, wherein the first sinusoidal pulse coding operation is performed variably according to the coding information. Accordingly, it is possible to further improve the quality of a synthesized signal by considering the sinusoidal pulse coding of a lower layer when encoding or decoding an audio signal in an upper layer by a layered sinusoidal pulse coding scheme.