摘要:
An input multi-channel representation is converted into a different output multi-channel representation of a spatial audio signal, in that an intermediate representation of the spatial audio signal is derived, the intermediate representation having direction parameters indicating a direction of origin of a portion of the spatial audio signal; and in that the output multi-channel representation of the spatial audio signal is generated using the intermediate representation of the spatial audio signal.
摘要:
The present invention is based on the finding that a reconstructed output channel, reconstructed with a multi-channel reconstructor using at least one downmix channel derived by downmixing a plurality of original channels and using a parameter representation including additional information on a temporal fine structure of an original channel can be reconstructed efficiently with high quality, when a generator for generating a direct signal component and a diffuse signal component based on the downmix channel is used. The quality can be essentially enhanced, if only the direct signal component is modified such that the temporal fine structure of the reconstructed output channel is fitting a desired temporal fine structure, indicated by the additional information on the temporal fine structure transmitted.
摘要:
For classifying different segments of a signal which has segments of at least a first type and second type, e.g. audio and speech segments, the signal is short-term classified on the basis of the at least one short-term feature extracted from the signal and a short-term classification result is delivered. The signal is also long-term classified on the basis of the at least one short-term feature and at least one long-term feature extracted from the signal and a long-term classification result is delivered. The short-term classification result and the long-term classification result are combined to provide an output signal indicating whether a segment of the signal is of the first type or of the second type.
摘要:
A selected channel of a multi-channel signal represented by frames composed from sampling values having a high time resolution is provided that can be encoded with higher quality when a wave form parameter representation representing a wave form of an intermediate resolution representation of the selected channel is derived. The wave form parameter representation with the intermediate resolution can be used to shape a reconstructed channel to retrieve a channel having a signal envelope close to a selected original channel. The time scale on which the shaping is performed is shorter than the time scale of a framewise processing, thus enhancing the quality of the reconstructed channel. On the other hand, the shaping time scale is larger than the time scale of the sampling values, significantly reducing the amount of data needed by the wave form parameter representation.
摘要:
An audio encoder, an audio decoder or an audio processor includes a filter for generating a filtered audio signal, the filter having a variable warping characteristic, the characteristic being controllable in response to a time-varying control signal, the control signal indicating a small or no warping characteristic or a comparatively high warping characteristic. Furthermore, a controller is connected for providing the time-varying control signal, which depends on the audio signal. The filtered audio signal can be introduced to an encoding processor having different encoding algorithms, one of which is a coding algorithm adapted to a specific signal pattern. Alternatively, the filter is a post-filter receiving a decoded audio signal.
摘要:
In a device for processing a stereo audio signal having a first channel and a second channel the stereo signal is at first analyzed to obtain a measure for a quantity of bits required by a coder to code the stereo audio signal using a coding algorithm. The first channel and the second channel are then modified when the measure for the quantity of bits is larger than a predetermined value, the modification being performed in such a way that the energy of a sum signal of the first and the second modified channel is in a predetermined relation to the energy of a sum signal of the first and the second channel and that a difference signal of the first and the second modified channel is attenuated in contrast to the difference signal of the first and the second channel. Especially for audio coders requiring a constant output bit rate the side channel is attenuated in the case of stereo audio signals, the coding of which cannot meet the output bit rate of the coder, by which a stereo channel separation is abandoned for the benefit of an increased audio bandwidth or a reduction of quantizing disturbances, respectively.
摘要:
An apparatus for processing a multi-channel signal includes a means for determining a similarity between a first one of two channels and a second one of the two channels. Furthermore, a means for performing a prediction filtering of the spectral coefficients is provided, which is formed to perform a prediction filtering with only a single prediction filter for both channels in case of high similarity between the first and the second channel, and to perform a prediction filtering with two separate prediction filters in case of a dissimilarity between the first and the second channel. With this, an introduction of stereo artifacts and a deterioration of the coding gain in stereo coding techniques are avoided.
摘要:
In a method for characterizing a signal, which represents an audio content, a measure for a tonality of the signal is determined, whereupon a statement is made about the audio content of the signal based on the measure for the tonality of the signal. The measure for the tonality of the signal for the content analysis is robust against a signal distortion, such as by MP3 encoding, and has a high correlation to the content of the examined signal.
摘要:
An apparatus for analyzing a sound signal is based on an ear model for deriving, for a number of inner hair cells, an estimate for a time-varying concentration of transmitter substance inside a cleft between an inner hair cell and an associated auditory nerve from the sound signal so that an estimated inner hair cell cleft contents map over time is obtained. This map is analyzed by means of a pitch analyzer to obtain a pitch line over time, the pitch line indicating a pitch of the sound signal for respective time instants. A rhythm analyzer is operative for analyzing envelopes of estimates for selected inner hair cells, the inner hair cells being selected in accordance with the pitch line, so that segmentation instants are obtained, wherein a segmentation instant indicates an end of the preceding note or a start of a succeeding note. Thus, a human-related and reliable sound signal analysis can be obtained.
摘要:
An apparatus for extracting a direct and/or ambience signal from a downmix signal and spatial parametric information, the downmix signal and the spatial parametric information representing a multi-channel audio signal having more channels than the downmix signal, wherein the spatial parametric information has inter-channel relations of the multi-channel audio signal, is described. The apparatus has a direct/ambience estimator and a direct/ambience extractor. The direct/ambience estimator is configured for estimating a level information of a direct portion and/or an ambient portion of the multi-channel audio signal based on the spatial parametric information. The direct/ambience extractor is configured for extracting a direct signal portion and/or an ambient signal portion from the downmix signal based on the estimated level information of the direct portion or the ambient portion.