Abstract:
Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.
Abstract:
Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.
Abstract:
Example embodiments disclosed herein relate to signal processing. A method for decomposing a plurality of audio signals from at least two different channels is disclosed. The method comprises obtaining a set of components that are weakly correlated, the set of components generated based on the plurality of audio signals. The method comprises extracting a feature from the set of components, and determining a set of gains associated with the set of components at least in part based on the extracted feature, each of the gains indicating a proportion of a diffuse part in the associated component. The method further comprises decomposing the plurality of audio signals by applying the set of gains to the set of components. Corresponding system and computer program product are also disclosed.
Abstract:
The present document describes a method (100) for extracting audio sources (301) from audio channels (302). The method (100) includes updating (102) a Wiener filter matrix based on a mixing matrix from a source matrix and based on a power matrix of the audio sources (301). Furthermore, the method (100) includes updating (103) a cross-covariance matrix of the audio channels (302) and of the audio sources (301) and an auto-covariance matrix of the audio sources (301), based on the updated Wiener filter matrix and based on an auto-covariance matrix of the audio channels (302). In addition, the method (100) includes updating (104) the mixing matrix and the power matrix based on the updated cross-covariance matrix of the audio channels (302) and of the audio sources (301), and/or based on the updated auto-covariance matrix of the audio sources (301).
Abstract:
Embodiments of the example embodiment relate to audio object extraction. A method for audio object extraction from audio content is disclosed. The method comprises determining a sub-band object probability for a sub-band of the audio signal in a frame of the audio content, the sub-band object probability indicating a probability of the sub-band of the audio signal containing an audio object. The method further comprises splitting the sub-band of the audio signal into an audio object portion and a residual audio portion based on the determined sub-band object probability. Corresponding system and computer program product are also disclosed.
Abstract:
Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.
Abstract:
Embodiments are directed to a method for processing an input audio signal, comprising: splitting the input audio signal into at least two components, in which the first component is characterized by fast fluctuations in the input signal envelope, and a second component that is relatively stationary over time; processing the second, stationary component by a decorrelation circuit; and constructing an output signal by combining the output of the decorrelator circuit with the input signal and/or the first component signal.
Abstract:
Embodiments for measuring content coherence and embodiments for measuring content similarity are described. Content coherence between a first audio section and a second audio section is measured. For each audio segment in the first audio section, a predetermined number of audio segments in the second audio section are determined. Content similarity between the audio segment in the first audio section and the determined audio segments is higher than that between the audio segment and all the other audio segments in the second audio section. An average of the content similarity between the audio segment in the first audio section and the determined audio segments is calculated. The content coherence is calculated as an average, the maximum or the minimum of the averages calculated for the audio segments in the first audio section. The content similarity may be calculated based on Dirichlet distribution.