摘要:
Systems and methods for preserving headphone rendering mode (HRM) in object clustering are described. In an embodiment, an object-based audio data processing system includes a processor configured to receive a plurality of audio objects, wherein an audio object of the plurality of audio objects is associated with respective object metadata that indicates respective spatial position information and an HRM; determine a plurality of cluster positions by applying an extended hybrid distance metric to a spatial coding algorithm to calculate a partial loudness for each of the audio objects; render the audio objects to the cluster positions to form a plurality of clusters by applying the extended hybrid distance metric to the spatial coding algorithm to calculate object-to-cluster gains; and transmit the clusters to a spatial reproduction system.
摘要:
An audio processing system and method which calculates, based on spatial metadata of the audio object, a panning coefficient for each of the audio objects in relation to each of a plurality of predefined channel coverage zones. Converts the audio signal into submixes in relation to the predefined channel coverage zones based on the calculated panning coefficients and the audio objects. Each of the submixes indicating a sum of components of the plurality of the audio objects in relation to one of the predefined channel coverage zones. Generating a submix gain by applying an audio processing to each of the submix and controls an object gain applied to each of the audio objects. The object gain being as a function of the panning coefficients for each of the audio objects and the submix gains in relation to each of the predefined channel coverage zones.
摘要:
A method of processing audio content including a plurality of audio elements comprises: clustering the plurality of audio elements into a plurality of clusters of audio elements; and for a cluster among the plurality of clusters: for each audio element in the cluster, determining a measure of energy that the audio element contributes to the cluster; for at least one audio element in the cluster, determining a compensation gain based at least in part on the measures of energy for the audio elements in the cluster; and applying the compensation gain to the at least one audio element in the cluster.
摘要:
Example embodiments disclosed herein relate to audio source separation with source direction determined based on iterative weighted component analysis. A method of separating audio sources in audio content is disclosed. The audio content includes a plurality of channels. The method includes obtaining multiple data samples from multiple time-frequency tiles of the audio content. The method also includes analyzing the data samples to generate multiple components in a plurality of iterations, wherein each of the components indicates a direction with a variance of the data samples, and wherein in each of the plurality of iterations, each of the data samples is weighted with a weight that is determined based on a selected component from the multiple components. The method further includes determining a source direction of the audio content based on the selected component for separating an audio source from the audio content. Corresponding system and computer program product of separating audio sources in audio content are also disclosed.
摘要:
Apparatus and methods for audio classifying and processing are disclosed. In one embodiment, an audio processing apparatus includes an audio classifier for classifying an audio signal into at least one audio type in real time; an audio improving device for improving experience of audience; and an adjusting unit for adjusting at least one parameter of the audio improving device in a continuous manner based on the confidence value of the at least one audio type.
摘要:
Example embodiments disclosed herein relate to audio object clustering with single channel quality preservation. A method of clustering audio objects is disclosed. The method includes determining cluster positions based on object positions of the audio objects and a reference speaker layout, the reference speaker layout indicating speakers located at different speaker positions. The method also includes determining object-to-cluster gains based on the determined cluster positions, the object positions and the reference speaker layout, an object-to-cluster gain defining a proportion of the respective audio object that is assigned to a cluster associated with one of the determined cluster positions. The method further includes clustering the audio objects based on the object-to-cluster gains and the cluster positions for generating cluster signals. Corresponding system, computer program product and device for clustering audio objects are also disclosed.
摘要:
Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.
摘要:
Embodiments of the present invention relate to audio object extraction. A method for audio object extraction from audio content of a format based on a plurality of channels is disclosed. The method comprises applying audio object extraction on individual frames of the audio content at least partially based on frequency spectral similarities among the plurality of channels. The method further comprises performing audio object composition across the frames of the audio content, based on the audio object extraction on the individual frames, to generate a track of at least one audio object. Corresponding system and computer program product are also disclosed.
摘要:
Audio processing method and audio processing apparatus, and training method are described. According to embodiments of the application, an accent identifier is used to identify accent frames from a plurality of audio frames, resulting in an accent sequence comprised of probability scores of accent and/or non-accent decisions with respect to the plurality of audio frames. Then a tempo estimator is used to estimate a tempo sequence of the plurality of audio frames based on the accent sequence. The embodiments can be well adaptive to the change of tempo, and can be further used to tracking beats properly.
摘要:
A method of audio processing includes classifying an audio signal as noise or as non-noise using a first model. For a noise signal. the audio signal is classified as user-generated content (UGC) noise or as professionally-generated content (PGC) noise using a second model. For a non-noise signal or PGC noise. the audio signal is processed using a first audio processing process. For UGC noise. the audio signal is processed using a second audio processing process.