Abstract:
Example embodiments disclosed herein relate to audio object clustering based on renderer-aware perceptual difference. A method of processing audio objects is provided. The method includes obtaining renderer-related information indicating a configuration of a renderer. The method also includes determining, based on the obtained renderer-related information, a rendering difference between a first audio object and a second audio object among the audio objects with respect to the renderer. The method further includes clustering the audio objects at least in part based on the rendering difference. Corresponding system, device, and computer program product are also disclosed.
Abstract:
Example embodiments disclosed herein relates to upmixing of audio signals. A method of upmixing an audio signal is described. The method includes decomposing the audio signal into a diffuse signal and a direct signal, generating an audio bed at least in part based on the diffuse signal, the audio bed including a height channel, extracting an audio object from the direct signal, estimating metadata of the audio object, the metadata including height information of the audio object; and rendering the audio bed and the audio object as an upmixed audio signal, wherein the audio bed is rendered to a predefined position and the audio object is rendered according to the metadata. Corresponding system and computer program product are described as well.
Abstract:
Embodiments of the example embodiment relate to audio object extraction. A method for audio object extraction from audio content is disclosed. The method comprises determining a sub-band object probability for a sub-band of the audio signal in a frame of the audio content, the sub-band object probability indicating a probability of the sub-band of the audio signal containing an audio object. The method further comprises splitting the sub-band of the audio signal into an audio object portion and a residual audio portion based on the determined sub-band object probability. Corresponding system and computer program product are also disclosed.
Abstract:
Embodiments are directed a method of rendering object-based audio comprising determining an initial spatial position of objects having object audio data and associated metadata, determining a perceptual importance of the objects, and grouping the audio objects into a number of clusters based on the determined perceptual importance of the objects, such that a spatial error caused by moving an object from an initial spatial position to a second spatial position in a cluster is minimized for objects with a relatively high perceptual importance. The perceptual importance is based at least in part by a partial loudness of an object and content semantics of the object.