摘要:
A frequency-domain audio codec is is provided with the ability to additionally support a certain transform length in a backward-compatible manner, by the following: the frequency-domain coefficients of a respective frame are transmitted in an interleaved manner irrespective of the signalization signaling for the frames as to which transform length actually applies, and additionally the frequency-domain coefficient extraction and the scale factor extraction operate independent from the signalization. By this measure, old-fashioned frequency-domain audio coders/decoders, insensitive for the signalization, would be able to nevertheless operate without faults and with reproducing a reasonable quality. Concurrently, frequency-domain audio coders/decoders able to support the additional transform length would offer even better quality despite the backward compatibility. As far as coding efficiency penalties due to the coding of the frequency domain coefficients in a manner transparent for older decoders are concerned, same are of comparatively minor nature due to the interleaving.
摘要:
An audio decoder for providing at least four bandwidth-extended channel signals on the basis of an encoded representation is configured to provide a first downmix signal and a second downmix signal on the basis of a jointly encoded representation of the first downmix signal and the second downmix signal using a multi-channel decoding. The audio decoder is configured to provide at least a first audio channel signal and a second audio channel signal on the basis of the first downmix signal using a multi-channel decoding. The audio decoder is configured to provide at least a third audio channel signal and a fourth audio channel signal on the basis of the second downmix signal using a multi-channel decoding. The audio decoder is configured to perform a multi-channel bandwidth extension on the basis of the first audio channel signal and the third audio channel signal, to obtain a first bandwidth-extended channel signal and a third bandwidth-extended channel signal. The audio decoder is configured to perform a multi-channel bandwidth extension on the basis of the second audio channel signal and the fourth audio channel signal, to obtain a second bandwidth extended channel signal and a fourth bandwidth extended channel signal. An audio encoder uses a related concept.
摘要:
An audio analyzer configured to obtain spectral domain representations of two or more input audio signals. Additionally the audio analyzer is configured to obtain directional information associated with spectral bands of the spectral domain representations and to obtain loudness information associated with different directions as an analysis result. Contributions to the loudness information are determined in dependence on the directional information.
摘要:
A multi-channel audio decoder for providing at least two output audio signals on the basis of an encoded representation is configured to perform a weighted combination of a downmix signal, a decorrelated signal and a residual signal, to obtain one of the output audio signals. The multi-channel audio decoder is configured to determine a weight describing a contribution of the decorrelated signal in the weighted combination in dependence on the residual signal. A multi-channel audio encoder for providing an encoded representation of a multi-channel audio signal is configured to obtain a downmix signal on the basis of the multi-channel audio signal, to provide parameters describing dependencies between the channels of the multi-channe! audio signal, and to provide a residual signal. The multi-channel audio encoder is configured to vary an amount of residual signal included into the encoded representation in dependence on the multi-channel audio signal.
摘要:
A frequency-domain audio codec is provided with the ability to additionally support a certain transform length in a backward-compatible manner, by the following: the frequency-domain coefficients of a respective frame are transmitted in an interleaved manner irrespective of the signalization signaling for the frames as to which transform length actually applies, and additionally the frequency-domain coefficient extraction and the scale factor extraction operate independent from the signalization. By this measure, old-fashioned frequency-domain audio coders/decoders, insensitive for the signalization, would be able to nevertheless operate without faults and with reproducing a reasonable quality. Concurrently, frequency-domain audio coders/decoders able to support the additional transform length would offer even better quality despite the backward compatibility. As far as coding efficiency penalties due to the coding of the frequency domain coefficients in a manner transparent for older decoders are concerned, same are of comparatively minor nature due to the interleaving.
摘要:
An apparatus for generating loudspeaker signals is provided. The apparatus comprises an object metadata processor (110) and an object renderer (120). The object renderer (120) is configured to receive an audio object. The object metadata processor (110) is configured to receive metadata, comprising an indication on whether the audio object is screen-related, and further comprising a first position of the audio object. The object metadata processor (110) is configured to calculate a second position of the audio object depending on the first position of the audio object and depending on a size of a screen, if the audio object is indicated in the metadata as being screen-related. The object renderer (120) is configured to generate the loudspeaker signals depending on the audio object and depending on position information. The object metadata processor (110) is configured to feed the first position of the audio object as the position information into the object renderer (120), if the audio object is indicated in the metadata as being not screen-related. The object metadata processor (110) is configured to feed the second position of the audio object as the position information into the object renderer (120), if the audio object is indicated in the metadata as being screen-related.
摘要:
Audio encoder for encoding audio input data (101) to obtain audio output data (501) comprises an input interface (100) for receiving a plurality of audio channels, a plurality of audio objects and metadata related to one or more of the plurality of audio objects; a mixer (200) for mixing the plurality of objects and the plurality of channels to obtain a plurality of pre-mixed channels, each pre-mixed channel comprising audio data of a channel and audio data of at least one object; a core encoder (300) for core encoding core encoder input data; and a metadata compressor (400) for compressing the metadata related to the one or more of the plurality of audio objects, wherein the audio encoder is configured to operate in at least one mode of the group of two modes comprising a first mode, in which the core encoder is configured to encode the plurality of audio channels and the plurality of audio objects received by the input interface as core encoder input data, and a second mode, in which the core encoder (300) is configured for receiving, as the core encoder input data, the plurality of pre-mixed channels generated by the mixer (200).
摘要:
An apparatus for generating loudspeaker signals is provided. The apparatus comprises an object metadata processor (110) and an object renderer (120). The object renderer (120) is configured to receive an audio object. The object metadata processor (110) is configured to receive metadata, comprising an indication on whether the audio object is screen-related, and further comprising a first position of the audio object. The object metadata processor (110) is configured to calculate a second position of the audio object depending on the first position of the audio object and depending on a size of a screen, if the audio object is indicated in the metadata as being screen-related. The object renderer (120) is configured to generate the loudspeaker signals depending on the audio object and depending on position information. The object metadata processor (110) is configured to feed the first position of the audio object as the position information into the object renderer (120), if the audio object is indicated in the metadata as being not screen-related. The object metadata processor (110) is configured to feed the second position of the audio object as the position information into the object renderer (120), if the audio object is indicated in the metadata as being screen-related.
摘要:
Audio decoder for decoding encoded audio data, comprising: an input interface (1100) for receiving the encoded audio data, the encoded audio data comprising a plurality of encoded channels or a plurality of encoded objects or compress metadata related to the plurality of objects; a core decoder (1300) for decoding the plurality of encoded channels and the plurality of encoded objects; a metadata decompressor (1400) for decompressing the compressed metadata; an object processor (1200) for processing the plurality of decoded objects using the decompressed metadata to obtain a number of output channels (1205) comprising audio data from the objects and the decoded channels; and a post-processor (1700) for converting the number of output channels (1205) into an output format, wherein the audio decoder is configured to bypass the object processor and to feed a plurality of decoded channels into the post-processor (1700), when the encoded audio data does not contain any audio objects and to feed the plurality of decoded objects and the plurality of decoded channels into the object processor (1200), when the encoded audio data comprises encoded channels and encoded objects..
摘要:
An apparatus for generating loudspeaker signals is provided. The apparatus comprises an object metadata processor (110) and an object renderer (120). The object renderer (120) is configured to receive an audio object. The object metadata processor (110) is configured to receive metadata, comprising an indication on whether the audio object is screen-related, and further comprising a first position of the audio object. The object metadata processor (110) is configured to calculate a second position of the audio object depending on the first position of the audio object and depending on a size of a screen, if the audio object is indicated in the metadata as being screen-related. The object renderer (120) is configured to generate the loudspeaker signals depending on the audio object and depending on position information. The object metadata processor (110) is configured to feed the first position of the audio object as the position information into the object renderer (120), if the audio object is indicated in the metadata as being not screen-related. The object metadata processor (110) is configured to feed the second position of the audio object as the position information into the object renderer (120), if the audio object is indicated in the metadata as being screen-related.