Abstract:
A spatial audio signal decoder is provided that includes a processor and storage media that includes instructions that when executed cause the processor to: receive input spatial audio signals including a set of channels having an input spatial format; partition the set of channels into at least a first channel subset and a second channel subset; determine an estimate of a number and directions of arrival of directional audio sources represented in at least a portion of the set of channels; determine one of the active and passive components of the first channel subset signals, based at least in part on the estimated number and directions of arrival of directional audio sources; determine the other of the active and passive components of the first channel subset signals, based upon the determined one of the active and passive components of the first channel subset signals; decode the components to an output signal.
Abstract:
A method comprises receiving input audio and target audio having a target audio characteristic. The method includes estimating key parameters that represent the target audio characteristic based on one or more of the target audio and the input audio. The method further comprises configuring a neural network, trained to be configured by the key parameters, with the key parameters to cause the neural network to perform a signal transformation of the input audio, to produce output audio having an output audio characteristic corresponding to and that matches the target audio characteristic.
Abstract:
There is disclosed methods and apparatus for decomposing a signal having a plurality of channels into direct and diffuse components. The correlation coefficient between each pair of signals from the plurality of signals may be estimated. A linear system of equations relating the estimated correlation coefficients and direct energy fractions of each of the plurality of channels may be constructed. The linear system may be solved to estimate the direct energy fractions. A direct component output signal and a diffuse component output signal may be generated based in part on the direct energy fractions.
Abstract:
A method comprise: receiving input audio and target audio having a target audio characteristic; using a first neural network, trained to generate key parameters that represent the target audio characteristic based on one or more of the target audio and the input audio, generating the key parameters; and configuring a second neural network, trained to be configured by the key parameters, with the key parameters to cause the second neural network to perform a signal transformation of the input audio, to produce output audio having an output audio characteristic corresponding to and that matches the target audio characteristic.
Abstract:
Systems and methods include audio encoders having improved coding of harmonic signals. The audio encoders can be implemented as transform-based codecs with frequency coefficients quantized using spectral weights. The frequency coefficients can be quantized by use of the generated spectral weights applied to the frequency coefficients prior to the quantization or by use of the generated spectral weights in computation of error within a vector quantization that performs the quantization. Additional apparatus, systems, and methods are disclosed.
Abstract:
Systems and methods are described for processing data from a sequential series of groups of frames to achieve a target average processing bit rate for a particular group of frames in the series. In an example, a look-ahead buffer circuit can be populated with a number of frames from a particular group of frames, and a bit allocation can be determined for a frame in the look-ahead buffer circuit using bit request information about all of the frames in the buffer. The look-ahead buffer circuit can be populated with streaming frame information in a first-in-first-out manner, and bit allocation processing can be performed for each frame, in a particular group of frames, based on a frame position in the look-ahead buffer circuit and further based on bit requests associated with other frames in the look-ahead buffer circuit.
Abstract:
A post-encoding bitrate reduction system and method for generating one more scaled compressed bitstreams from a single encoded plenary file. The plenary file contains multiple audio object files that were encoded separately using a scalable encoding process having fine-grained scalability. Activity in the data frames of the encoded audio object files at a time period are compared with each other to obtain a data frame activity comparison. Bits from an available bitpool are assigned to all of the data frames based on the data frame activity comparison and corresponding hierarchical metadata. The plenary file is scaled down by truncating bits in the data frames to conform to the bit allocation. In some embodiments frame activity is compared to a silence threshold and the data frame contains silence if the frame activity is less than or equal to the threshold and minimal bits are used to represent the silent frame.
Abstract:
There are disclosed automatic mixers and methods for creating a surround audio mix. A set of rules may be stored in a rule base. A rule engine may select a subset of the set of rules based, at least in part, on metadata associated with a plurality of stems. A mixing matrix may mix the plurality of stems in accordance with the selected subset of rules to provide three or more output channels.
Abstract:
Devices and methods are adapted to characterize a multi-channel loudspeaker configuration, to correct loudspeaker room delay, gain and frequency response or to configure sub-band domain correction fillers. In an embodiment for characterizing a multi-channel loudspeaker configuration, a broadband probe signal is supplied to each audio output of an preamplifier of which a plurality are coupled to loudspeakers in a multi-channel configuration in a listening environment. The loudspeakers convert the probe signal to acoustic responses that are transmitted in non-overlapping time slots separated by silent periods as sound waves into the listening environment. For each audio output that is probed, sound waves are received by a multi-microphone array that converts the acoustic responses to broadband electric response signals.
Abstract:
A frequency domain long-term prediction system and method for estimating and applying an optimum long term predictor. Embodiments of the system and method include determining parameters of a single-tap predictor using a frequency- domain analysis having an optimality criteria based on spectral flatness measure. Embodiments of the system and method also include determining parameters of the long-term predictor by accounting for the performance of the vector quantizer in quantizing the various subbands. In some embodiments other encoder metrics (such as signal tonality) are used as well. Other embodiments of the system and method include determining the optimal parameters of the long-term predictor by accounting for some of the decoder operation. Other embodiments of the system and method include extending a 1-tap predictor to a k-th order predictor by convolving the 1-tap predictor with a pre-set filter and selecting from a table of such pre-set filters based on a minimum energy criteria.