摘要:
For producing a fingerprint of an audio signal, use is made of information defining a plurality of predetermined fingerprint modi, all of the fingerprint modi relating to the same type of fingerprint, the fingerprint modi, however, providing different fingerprints differing from each other with regard to their data volume, on the one hand, and to their characterizing strength for characterizing the audio signal, on the other hand, the fingerprint modi being pre-determined such that a fingerprint in accordance with a fingerprint modus having a first characterizing strength is convertible to a fingerprint in accordance with a fingerprint modus having a second characterizing strength, without using the audio signal. A predetermined fingerprint modus of the plurality of predetermined fingerprint modi is set and subsequently used for computing a fingerprint using the audio signal. The convertibility feature of the fingerprints having been produced by the different fingerprint modi enables setting a flexible compromise between the data volume and the characterizing strength for certain applications without having to re-generate a fingerprint database with each change of the fingerprint modus. Fingerprint representations scaled with regard to time or frequency may readily be converted to a different fingerprint modus.
摘要:
In a method for characterizing a signal representing an audio content a measure is determined for a tonality of the signal, whereupon a statement is made about the audio content of the signal on the basis of the measure for the tonality of the signal. The measure for the tonality is derived from a quotient whose numerator is the mean of the summed values of spectral components of the signal exponentiated with a first power and whose denominator is the mean of the summed values of spectral components exponentiated with a second power, the first and second powers differing from each other. The measure for the tonality of the signal for the content analysis is robust in relation to a signal distortion, due e.g. to MP3 coding, and has a high correlation with the content of the analyzed signal.
摘要:
For synthesizing at least three output channels using two stereo input channels, the stereo input channels are analyzed to detect signal components occurring in both input channels. A signal generator is operative to introduce at least a part of the detected signal components into the second channel associated with a second speaker in an intended speaker scheme, which is positioned between a first and a third speaker in the speaker scheme. When, however, feeding of the complete detected signal components would result in a clipping situation, then only a part of the detected signal components is fed into the second channel as a real center channel and the remainder is located in the first and third channels as a phantom center channel.
摘要:
An integer transform, which provides integer output values, carries out the TDAC function of a MDCT in the time domain before the forward transform. In overlapping windows, this results in a Givens rotation which may be represented by lifting matrices, wherein time-discrete sampled values of an audio signal may at first be summed up on a pair-wise basis to build a vector so as to be sequentially provided with a lifting matrix. After each multiplication of a vector by a lifting matrix, a rounding step is carried out such that, on the output-side, only integers will result. By transforming the windowed integer sampled value with an integer transform, a spectral representation with integer spectral values may be obtained. The inverse mapping with an inverse rotation matrix and corresponding inverse lifting matrices results in an exact reconstruction.
摘要:
Prior to embedding a watermark in an audio signal, a spectral representation of the audio signal and a spectral representation of the watermark signal are determined. The spectral representation of the watermark signal is then processed on the basis of a psychoacoustic masking threshold of the audio signal. The processed watermark signal is combined with the audio signal to obtain an audio signal bearing a watermark. The spectral representation of the watermark signal is processed iteratively as follows: first a predetermined watermark initial value is selected, then the interference introduced into the spectral representation of the audio signal after a quantization of the spectral representation of the audio signal is determined and then, if the interference introduced by the watermark initial value exceeds the predetermined interference threshold, the watermark initial value is modified progressively until the resulting interference introduced into the spectral representation of the audio signal after quantization is less than or equal to the predetermined interference threshold. The modified watermark initial value at the end of the iteration is used as the processed watermark signal to be combined with the audio signal. As a result it is no longer possible for a watermark to be quantized out. Instead, full control over the energy of the watermark is achieved. A watermark can therefore be embedded in an audio signal to provide either the best possible degree of watermark detectability or the best possible audio quality.
摘要:
On an encoder-side, a multi-channel input signal is analyzed for obtaining smoothing control information, which is to be used by a decoder-side multi-channel synthesis for smoothing quantized transmitted parameters or values derived from the quantized transmitted parameters for providing an improved subjective audio quality in particular for slowly moving point sources and rapidly moving point sources having tonal material such as fast moving sinusoids.
摘要:
In processing a multi-channel audio signal having at least three original channels, a first downmix channel and a second downmix channel are provided, which are derived from the original channels. For a selected original channel of the original channels, channel side information are calculated such that a downmix channel or a combined downmix channel including the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel. The channel side information and the first and second downmix channels form output data to be transmitted to a decoder, which, in case of a low level decoder only decodes the first and second downmix channels or, in case of a high level decoder provides a full multi-channel audio signal based on the downmix channels and the channel side information.
摘要:
An apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function comprises a filter bank for separating the audio signal into at least two sub-band signals. The sub-band signals are examined with regard to periodicities by an autocorrelation function, to obtain rhythm raw-information for the at least two sub-band signals. To reduce or eliminate the ambiguities of the autocorrelation function for periodical signals, the rhythm raw-information is postprocessed to obtain post-processed rhythm raw-information for the sub-band signal. The rhythm information of the audio signal is established based on the postprocessed rhythm raw-information. By the sub-band-wise ACF postprocessing, ACF ambiguities are already eliminated where they originate, and rhythm portions are added at double tempi, which an autocorrelation function processing does normally not provide, so that, as a result, a more robust determination of the rhythm information of the audio signal arises.
摘要:
A method for coding or decoding an audio signal combines the advantages of TNS processing and noise substitution. A time-discrete audio signal is initially transformed to the frequency domain in order to obtain spectral values of the temporal audio signal. Subsequently, a prediction of the spectral values in relation to frequency is carried out in order to obtain spectral residual values. Within the spectral residual values, areas are detected encompassing spectral residual values with noise properties. The spectral residual values in the noise areas are noise-substituted, whereupon information concerning the noise areas and noise substitution is incorporated into side information pertaining to a coded audio signal. Thus, considerable bit savings in case of transient signals can be achieved.
摘要:
In processing a multi-channel audio signal having at least three original channels, a first downmix channel and a second downmix channel are provided, which are derived from the original channels. For a selected original channel, channel side information are calculated such that a downmix channel or a combined downmix channel including the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel. The channel side information and the first and second downmix channels form output data to be transmitted to a decoder, which, in case of a low level decoder only decodes the first and second downmix channels or, in case of a high level decoder provides a full multi-channel audio signal based on the downmix channels and the channel side information.