摘要:
For producing a fingerprint of an audio signal, use is made of information defining a plurality of predetermined fingerprint modi, all of the fingerprint modi relating to the same type of fingerprint, the fingerprint modi, however, providing different fingerprints differing from each other with regard to their data volume, on the one hand, and to their characterizing strength for characterizing the audio signal, on the other hand, the fingerprint modi being pre-determined such that a fingerprint in accordance with a fingerprint modus having a first characterizing strength is convertible to a fingerprint in accordance with a fingerprint modus having a second characterizing strength, without using the audio signal. A predetermined fingerprint modus of the plurality of predetermined fingerprint modi is set and subsequently used for computing a fingerprint using the audio signal. The convertibility feature of the fingerprints having been produced by the different fingerprint modi enables setting a flexible compromise between the data volume and the characterizing strength for certain applications without having to re-generate a fingerprint database with each change of the fingerprint modus. Fingerprint representations scaled with regard to time or frequency may readily be converted to a different fingerprint modus.
摘要:
In a method for characterizing a signal representing an audio content a measure is determined for a tonality of the signal, whereupon a statement is made about the audio content of the signal on the basis of the measure for the tonality of the signal. The measure for the tonality is derived from a quotient whose numerator is the mean of the summed values of spectral components of the signal exponentiated with a first power and whose denominator is the mean of the summed values of spectral components exponentiated with a second power, the first and second powers differing from each other. The measure for the tonality of the signal for the content analysis is robust in relation to a signal distortion, due e.g. to MP3 coding, and has a high correlation with the content of the analyzed signal.
摘要:
An apparatus for producing a fingerprint signal from an audio signal includes a means for calculating energy values for frequency bands of segments of the audio signal which are successive in time, so as to obtain, from the audio signal, a sequence of vectors of energy values, a means for scaling the energy values to obtain a sequence of scaled vectors, and a means for temporal filtering of the sequence of scaled vectors to obtain a filtered sequence which represents the fingerprint, or from which the fingerprint may be derived. Thus, a fingerprint is produced which is robust against disturbances due to problems associated with coding or with transmission channels, and which is especially suited for mobile radio applications.
摘要:
For analyzing an information signal having a sequence of blocks of information units, wherein a plurality of consecutive blocks of the sequence of blocks represents an information entity, using a sequence of fingerprints for the sequence of blocks, identification results are provided for consecutive fingerprints, wherein an identification result represents an association of a block of information units with a predetermined information entity. Then at least two hypotheses are formed from the identification results for the consecutive fingerprints, wherein a first hypothesis is an assumption for the association of the sequence of blocks with a first information entity, and wherein the second hypothesis is an assumption for the association of the sequence of blocks with the second information entity. Then various hypotheses are examined to obtain an examination result on the basis of which there is then made a statement on the information signal. This achieves a meaningful and reliable time-continuous analysis of an information signal.
摘要:
For analyzing an information signal having a sequence of blocks of information units, wherein a plurality of consecutive blocks of the sequence of blocks represents an information entity, using a sequence of fingerprints for the sequence of blocks, identification results are provided for consecutive fingerprints, wherein an identification result represents an association of a block of information units with a predetermined information entity. Then at least two hypotheses are formed from the identification results for the consecutive fingerprints, wherein a first hypothesis is an assumption for the association of the sequence of blocks with a first information entity, and wherein the second hypothesis is an assumption for the association of the sequence of blocks with the second information entity. Then various hypotheses are examined to obtain an examination result on the basis of which there is then made a statement on the information signal. This achieves a meaningful and reliable time-continuous analysis of an information signal.
摘要:
An apparatus for producing a fingerprint signal from an audio signal includes a means for calculating energy values for frequency bands of segments of the audio signal which are successive in time, so as to obtain, from the audio signal, a sequence of vectors of energy values, a means for scaling the energy values to obtain a sequence of scaled vectors, and a means for temporal filtering of the sequence of scaled vectors to obtain a filtered sequence which represents the fingerprint, or from which the fingerprint may be derived. Thus, a fingerprint is produced which is robust against disturbances due to problems associated with coding or with transmission channels, and which is especially suited for mobile radio applications.
摘要:
In order to generate a multi-channel signal having a number of output channels greater than a number of input channels, a mixer is used for upmixing the input signal to form at least a direct channel signal and at least an ambience channel signal. A speech detector is provided for detecting a section of the input signal, the direct channel signal or the ambience channel signal in which speech portions occur. Based on this detection, a signal modifier modifies the input signal or the ambience channel signal in order to attenuate speech portions in the ambience channel signal, whereas such speech portions in the direct channel signal are attenuated to a lesser extent or not at all. A loudspeaker signal outputter then maps the direct channel signals and the ambience channel signals to loudspeaker signals which are associated to a defined reproduction scheme, such as, for example, a 5.1 scheme.
摘要:
According to the present invention, multiple parametrically encoded audio signals can be efficiently combined using an audio signal generator, which generates an audio output signal by combining the down-mix channels and the associated parameters of the audio signals directly within the parameter domain, i.e. without reconstructing or decoding the individual input audio signals prior to the generation of the audio output signal. This is achieved by direct mixing of the associated down-mix channels of the individual input signals. It is one key feature of the present invention that the combination of the down-mix channels is achieved by simple, computationally inexpensive arithmetic operations.
摘要:
In order to generate a multi-channel signal having a number of output channels greater than a number of input channels, a mixer is used for upmixing the input signal to form at least a direct channel signal and at least an ambience channel signal. A speech detector is provided for detecting a section of the input signal, the direct channel signal or the ambience channel signal in which speech portions occur. Based on this detection, a signal modifier modifies the input signal or the ambience channel signal in order to attenuate speech portions in the ambience channel signal, whereas such speech portions in the direct channel signal are attenuated to a lesser extent or not at all. A loudspeaker signal outputter then maps the direct channel signals and the ambience channel signals to loudspeaker signals which are associated to a defined reproduction scheme, such as, for example, a 5.1 scheme.
摘要:
According to the present invention, multiple parametrically encoded audio signals can be efficiently combined using an audio signal generator, which generates an audio output signal by combining the down-mix channels and the associated parameters of the audio signals directly within the parameter domain, i.e. without reconstructing or decoding the individual input audio signals prior to the generation of the audio output signal. This is achieved by direct mixing of the associated down-mix channels of the individual input signals. It is one key feature of the present invention that the combination of the down-mix channels is achieved by simple, computationally inexpensive arithmetic operations.