Abstract:
A method of generating a residual signal performed by an encoder includes identifying an input signal including an audio sample, generating a first residual signal from the input signal using linear predictive coding (LPC), generating a second residual signal having a less information amount than the first residual signal by transforming the first residual signal, transforming the second residual signal into a frequency domain, and generating a third residual signal having a less information amount than the second residual signal from the transformed second residual signal using frequency-domain prediction (FDP) coding.
Abstract:
An audio signal encoding/decoding device and method using a filter bank is disclosed. The audio signal encoding method includes generating a plurality of first audio signals by performing filtering on an input audio signal using an analysis filter bank, generating a plurality of second audio signals by performing downsampling on the first audio signals, and outputting a bitstream by encoding and quantizing the second audio signals.
Abstract:
An audio metadata providing apparatus and method and a multichannel audio data playback apparatus and method to support a dynamic format conversion are provided. Dynamic format conversion information may include information about a plurality of format conversion schemes that are used to convert a first format set by a writer of multichannel audio data into a second format that is based on a playback environment of the multichannel audio data and that are set for each of playback periods of the multichannel audio data. The audio metadata providing apparatus may provide audio metadata including the dynamic format conversion information. The multichannel audio data playback apparatus may identify the dynamic format conversion information from the audio metadata, may convert the first format of the multichannel audio data into the second format based on the identified dynamic format conversion information, and may play back the multichannel audio data with the second format.
Abstract:
An emotional speech generating method and apparatus capable of adjusting an emotional intensity is disclosed. The emotional speech generating method includes generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group, determining an internal distance between weight vectors included in a same emotion group, determining an external distance between weight vectors included in a same emotion group and weight vectors included in another emotion group, determining a representative weight vector of each of the emotion groups based on the internal distance and the external distance, generating a style embedding by applying the representative weight vector of each of the emotion groups to a style token including prosodic information for expressing an emotion, and generating an emotional speech expressing the emotion using the style embedding.
Abstract:
Provided is an encoding apparatus for integrally encoding and decoding a speech signal and a audio signal, and may include: an input signal analyzer to analyze a characteristic of an input signal; a stereo encoder to down mix the input signal to a mono signal when the input signal is a stereo signal, and to extract stereo sound image information; a frequency band expander to expand a frequency band of the input signal; a sampling rate converter to convert a sampling rate; a speech signal encoder to encode the input signal using a speech encoding module when the input signal is a speech characteristics signal; a audio signal encoder to encode the input signal using a audio encoding module when the input signal is a audio characteristic signal; and a bitstream generator to generate a bitstream.
Abstract:
Disclosed is a content processing method including receiving content including broadcast data and advertisement data into which additional information is inserted, extracting the additional information from the advertisement data, identifying the advertisement data from the content based on the extracted additional information, and extracting the broadcast data excluding the advertisement data identified from the content, wherein the additional information is inserted at at least one of optimal intervals determined based on test additional information inserted at a plurality of analysis intervals of an audio signal associated with the advertisement data.
Abstract:
An audio signal identification method and apparatus are provided. The audio signal identification method includes generating an amplitude map from an input audio signal, determining whether a portion of the amplitude map is a target portion corresponding to a target signal, using a pre-trained model, extracting feature data from the target portion, and identifying the audio signal based on the feature data.
Abstract:
Disclosed is a method and an apparatus for embedding data in an audio signal based on a time domain, and a method and an apparatus for extracting data from an audio signal based on a time domain. The method for embedding data in an audio signal based on a time domain may include generating a time-domain insertion sequence from original data based on a weighting element, embedding the insertion sequence in a host audio signal, and transmitting the host audio signal in which the insertion sequence is embedded. The method for extracting data from an audio signal based on a time domain may include receiving a time-domain audio signal in which data is embedded, extracting a codeword from the audio signal, and synchronizing the audio signal based on the codeword.
Abstract:
A system and method for synchronizing an audio signal and a video signal are provided. A decoding method in the system may include decoding an audio signal and a video signal received from an encoding apparatus, extracting first unique information of the audio signal from the decoded video signal, generating second unique information of the audio signal based on the decoded audio signal, determining a delay between the audio signal and the video signal by comparing the first unique information to the second unique information, and synchronizing the audio signal and the video signal based on the delay. The first unique information may be generated based on an audio signal that is not encoded by the encoding apparatus, and may be inserted into the video signal.
Abstract:
A method and an apparatus for transmitting a watermark robust to an acoustic channel distortion are disclosed. The method of transmitting the watermark may include extracting a watermark from a first audio signal including the watermark; modifying the extracted watermark based on a state of an acoustic channel; and embedding the modified watermark into the first audio signal to output a second audio signal.