摘要:
A method and a device are described for processing a stereo signal obtained from an encoder, which encodes an N-channel audio signal into spatial parameters (P) and a stereo down-mix comprising first and second stereo signals (LO, RO). A first signal and a third signal are added in order to obtain a first output signal (L0w), wherein the first signal (L0wL) comprises the first stereo signal (LO) modified by a first complex function (g1), and the third signal (L0wR) comprises the second stereo signal (RO) modified by a third complex function (g3). A second signal and a fourth signal are added to obtain a second output signal (R0w). The fourth signal (R0wR) comprises the second stereo signal (RO) modified by a fourth complex function (g4), and the second signal (R0wL) comprises the first stereo signal (LO) modified by a second complex function (g2). The complex functions (g1, g2, g3, g4) are functions of the spatial parameters (P) and are chosen to be such that an energy value of the difference (L0wL,R0wL) between the first signal and the second signal is larger than or equal to the energy value of the sum (L0wL+R0wL) of the first and the second signal, and the energy value of the difference (R0wR−L0wR) between the fourth signal and the third signal is larger than or equal to the energy value of the sum (R0wR+L0wR) of the fourth signal and the third signal.
摘要:
A method for headphone reproduction of at least two input channel signals is proposed. Said method comprises for each pair of input channel signals from said at least two input channel signals the following steps. First, a common component, an estimated desired position corresponding to said common component, and two residual components corresponding to two input channel signals in said pair of input channel signals are determined. Said determining is being based on said pair of said input channel signals. Each of said residual components is derived from its corresponding input channel signal by subtracting a contribution of the common component. Said contribution is being related to the estimated desired position of the common component. Second, a main virtual source comprising said common component at the estimated desired position and two further virtual sources each comprising a respective one of said residual components at respective predetermined positions are synthesized.
摘要:
An encoding device (1) and method convert a set of signals (l, r) into a dominant signal (m) containing most signal energy, a residual signal (s) containing a remainder of the signal energy, and signal parameters (IID, ICC) associated with the conversion. The dominant signal (m) and selected parts of the residual signal (s) are encoded. Selecting parts of the residual signal involves a residual signal (′) passing perceptually relevant parts of the residual signal (s), attenuating perceptually less relevant parts of the residual signal and suppressing least relevant parts of the residual signal. An associated decoding device (2) and method decode the encoded dominant signal and the encoded residual signal so as to produce a decoded dominant signal (m′u) and a decoded residual signal (s′mod) respectively. A synthetic residual signal (s′syn) is derived from the decoded dominant signal (m′u) and is attenuated so as to produce an attenuated synthetic residual signal (S′Syn,mod). The attenuated synthetic residual signal (Ssyn, mod) and the decoded residual signal (S′mod) are combined to produce a reconstructed residual signal (s′). The decoded dominant signal (m′) and the reconstructed residual signal (s′) are then converted into a set of output signals (l′, r′).
摘要:
An audio encoder comprises a multi-channel receiver (401) which receives an M-channel audio signal where M>2. A down-mix processor (403) down-mixes the M-channel audio signal to a first stereo signal and associated parametric data and a spatial processor (407) modifies the first stereo signal to generate a second stereo signal in response to the associated parametric data and spatial parameter data for a binaural perceptual transfer function, such as a Head Related Transfer Function (HRTF). The second stereo signal is a binaural signal and may specifically be a (3D) virtual spatial signal. An output data stream comprising the encoded data and the associated parametric data is generated by an encode processor (411) and an output processor (413). The HRTF processing may allow the generation of a (3D) virtual spatial signal by conventional stereo decoders. A multi-channel decoder may reverse the process of the spatial processor (407) to generate an improved quality multi-channel signal.
摘要:
The invention describes a method of deriving a set of features (S) of an audio input signal (M), which method comprises identifying a number of first-order features (f1, f2, . . . , ff) of the audio input signal (M), generating a number of correlation values (ρ1, ρ2, . . . , PI) from at least part of the first-order features (f1, f2, . . . , ff), and compiling the set of features (S) for the audio input signal (M) using the correlation values (ρ1, ρ2, . . . , ρI). The invention further describes a method of classifying an audio input signal (M) into a group, and a method of comparing audio input signals (M, M′) to determine a degree of similarity between the audio input signals (M, M′). The invention also describes a system (1) for deriving a set of features (S) of an audio input signal (M), a classifying system (4) for classifying an audio input signal (M) into a group, and a comparison system (5) for comparing audio input signals (M, M′) to determine a degree of similarity between the audio input signals (M, M′).
摘要:
A spatial decoder unit (23) is arranged for transforming one or more audio channels (s; l, r) into a pair of bin-aural output channels (Ib, rb). The device comprises a parameter conversion unit (234) for converting the spatial parameters (sp) into binaural parameters (bp) containing binaural information. The device additionally comprises a spatial synthesis unit (232) for transforming the audio channels (L, R) into a pair of binaural signals (Lb, Rb) while using the binaural parameters (bp). The spatial synthesis unit (232) preferably operates in a transform domain, such as the QMF domain.
摘要:
Method for processing a stereo signal includes encoding an N-channel audio signal in a stereo signal (Lo, Ro) and spatial parameters (wl, wr), processing the stereo signal using the spatial parameters for generating a processed stereo signal (low, Row). The matrix of the processed stereo signal is described as the matrix of the stereo signal, multiplied by a filter matrix (H) having element that are filter functions (H1, H2, H3, H4) operated with spatial parameters (wl, wr) and a constant (a). The filter functions are time invariant and selected so that the matrix is invertible.
摘要:
An audio system comprises a receiver (301) for receiving an audio signal, such as an audio object or a signal of a channel of a spatial multi-channel signal. A binaural circuit (303) generates a binaural output signal by processing the audio signal. The processing is representative of a binaural transfer function providing a virtual sound source position for the audio signal. A measurement circuit (307) generating measurement data indicative of a characteristic of the acoustic environment and a determining circuit (311) determines an acoustic environment parameter in response to the measurement data. The acoustic environment parameter may typically be a reverberation parameter, such as a reverberation time. An adaptation circuit (313) adapts the binaural transfer function in response to the acoustic environment parameter. For example, the adaptation may modify a reverberation parameter to more closely resemble the reverberation characteristics of the acoustic environment.
摘要:
A multi-channel audio encoder (10) for encoding a multi-channel audio signal (101), e.g. a 5.1 channel audio signal, into a spatial down-mix (102), e.g. a stereo signal, and associated parameters (104, 105). The encoder (10) comprises first and second units (110, 120). The first unit (110) encodes the multi-channel audio signal (101) into the spatial down-mix (102) and parameters (104). These parameters (104) enable a multi-channel decoder (20) to reconstruct the multi-channel audio signal (203) from the spatial down-mix (102). The second unit (120) generates, from the spatial down-mix (102), parameters (105) that enable the decoder to reconstruct the spatial down-mix (202) from an alternative down-mix (103), e.g. a so-called artistic down-mix that has been manually mixed in a sound studio. In this way, the decoder (20) can efficiently deal with a situation in which an alternative down-mix (103) is received instead of the regular spatial, down-mix (102). In the decoder (20), first the spatial down-mix (202) is reconstructed from the alternative down-mix (103) and the parameters (105). Next, the spatial down-mix (202) is decoded into the multi-channel audio signal (203).
摘要:
An encoding device (1) and method convert a set of signals (l, r) into a dominant signal (m) containing most signal energy, a residual signal (s) containing a remainder of the signal energy, and signal parameters (IID, ICC) associated with the conversion. The dominant signal (m) and selected parts of the residual signal (s) are encoded. Selecting parts of the residual signal involves a residual signal (s′) passing perceptually relevant parts of the residual signal (s), attenuating perceptually less relevant parts of the residual signal and suppressing least relevant parts of the residual signal. An associated decoding device (2) and method decode the encoded dominant signal and the encoded residual signal so as to produce a decoded dominant signal (m′u) and a decoded residual signal (s′mod) respectively. A synthetic residual signal (s′syn) is derived from the decoded dominant signal (m′u) and is attenuated so as to produce an attenuated synthetic residual signal (s′syn,mod). The attenuated synthetic residual signal (s′syn,mod) and the decoded residual signal (s′mod) are combined to produce a reconstructed residual signal (s′). The decoded dominant signal (m′) and the reconstructed residual signal (s′) are then converted into a set of output signals (l′, r′).