Abstract:
An audio processing method may involve receiving output signals from each microphone of a plurality of microphones in an audio environment, the output signals corresponding to a current utterance of a person and determining, based on the output signals, one or more aspects of context information relating to the person, including an estimated current proximity of the person to one or more microphone locations. The method may involve selecting two or more loudspeaker-equipped audio devices based, at least in part, on the one or more aspects of the context information, determining one or more types of audio processing changes to apply to audio data being rendered to loudspeaker feed signals for the audio devices and causing one or more types of audio processing changes to be applied. In some examples, the audio processing changes have the effect of increasing a speech to echo ratio at one or more microphones.
Abstract:
Systems and methods for providing forward error correction for a multi-channel audio signal are described. Blocks of an audio stream are buffered into a frame. A transformation can be applied that compacts the energy of each block into a plurality of transformed channels. The energy compaction transform may compact the most energy of a block into the first transformed channel and to compact decreasing amounts of energy into each subsequent transformed channel. The transformed frame may be encoded using any suitable codec and transmitted in a packet over a network. Improved forward error correction may be provided by attaching a low bit rate encoding of the first transformed channel to a subsequent packet. To reconstruct a lost packet, the low bit rate encoding of the first channel for the lost packet may be combined with a packet loss concealment version of the other channels, constructed from a previously-received packet.
Abstract:
Systems and methods for providing forward error correction for a multi-channel audio signal are described. Blocks of an audio stream are buffered into a frame. A transformation can be applied that compacts the energy of each block into a plurality of transformed channels. The energy compaction transform may compact the most energy of a block into the first transformed channel and to compact decreasing amounts of energy into each subsequent transformed channel. The transformed frame may be encoded using any suitable codec and transmitted in a packet over a network. Improved forward error correction may be provided by attaching a low bit rate encoding of the first transformed channel to a subsequent packet. To reconstruct a lost packet, the low bit rate encoding of the first channel for the lost packet may be combined with a packet loss concealment version of the other channels, constructed from a previously-received packet.
Abstract:
A method for interactive and user guided manipulation of multichannel audio content, the method including the steps of: providing a content preview facility for replay and review of multichannel audio content by a user; providing a user interface for the user selection of a segment of multichannel audio content having an unsatisfactory audio content; processing the audio content to include associated audio object activity spatial or signal space regions, to create a time line of activity where one or more spatial or signal space regions are active at any given time; matching the user's gesture input against at least one of the active spatial or signal space regions; signal processing the audio emanating from selected active spatial or signal space region using a number of differing techniques to determine at least one processed alternative; providing the user with an interactive playback facility to listen to the processed alternative.
Abstract:
Teleconference audio data including a plurality of individual uplink data packet streams, may be received during a teleconference. Each uplink data packet stream may corresponding to a telephone endpoint used by one or more teleconference participants. The teleconference audio data may be analyzed to determine a plurality of suppressive gain coefficients, which may be applied to first instances of the teleconference audio data during the teleconference, to produce first gain-suppressed audio data provided to the telephone endpoints during the teleconference. Second instances of the teleconference audio data, as well as gain coefficient data corresponding to the plurality of suppressive gain coefficients, may be sent to a memory system as individual uplink data packet streams. The second instances of the teleconference audio data may be less gain-suppressed than the first gain-suppressed audio data.
Abstract:
Example embodiments disclosed herein relate to a estimation of reverberant energy components from audio sources. A method of estimating a reverberant energy component from an active audio source (100) is disclosed. The method comprises determining a correspondence between the active audio source and a plurality of sample sources by comparing one or more spatial features of the active audio source with one or more spatial features of the plurality of sample sources, each of the sample sources being associated with an adaptive filtering model (101); obtaining an adaptive filtering model for the active audio source based on the determined correspondence (102); and estimating the reverberant energy component from the active audio source over time based on the adaptive filtering model (103). Corresponding system (800) and computer program product (900) are also disclosed.
Abstract:
Apparatus comprising an interface for receiving a respective uplink data stream from each of three or more further apparatuses, and for transmitting a respective downlink data stream to each of the further apparatuses; and a logic system in communication with the interface. The logic system is configured: to receive first data in the uplink data stream received from a first one of the further apparatuses; and in a first mode, to include at least some of the first data in the respective downlink data streams transmitted to every other one of the further apparatuses, or, in a second mode, to include at least some of the first data in the downlink data stream transmitted to a second one of the further apparatuses and to omit or attenuate substantially all of the first data in the downlink data stream transmitted to at least a third one of the further apparatuses. Corresponding methods and computer readable media are disclosed.
Abstract:
Systems and methods are described for modifying one of far-end signal playback and capture of local audio on an audio device. Frames of both a far-end audio stream and a near-end audio stream may be analyzed using a measure of voice activity, the analyzing producing voice data associated with each frame. Based on the voice data, a conference state may be determined, and one of playback of the far-end audio stream and capture of local audio on an audio device may be modified based on the determined conference state. By associating the likely intent with a predefined state, the device may further cull or remove unwanted or unlikely content from the device input and output. This may have a substantial advantage in allowing for full duplex operation in the case of more meaningful and continuing voice activity, particularly in the case where there are many connected endpoints.
Abstract:
The present application provides an acoustic echo mitigation apparatus and method, an audio processing apparatus and a voice communication terminal. According to an embodiment, an acoustic echo mitigation apparatus is provided, including: an acoustic echo canceller for cancelling estimated acoustic echo from a microphone signal and outputting an error signal; a residual echo estimator for estimating residual echo power; and an acoustic echo suppressor for further suppressing residual echo and noise in the error signal based on the residual echo power and noise power. Here, the residual echo estimator is configured to be continuously adaptive to power change in the error signal. According to the embodiments of the present application, the acoustic echo mitigation apparatus and method can, at least, be well adaptive to the change of power of the error signal after the AEC processing, such as that caused by change of double-talk status, echo path properties, noise level and etc.
Abstract:
A method, an apparatus, and logic to post-process raw gains determined by input processing to generate post-processed gains, comprising using one or both of delta gain smoothing and decision-directed gain smoothing. The delta gain smoothing comprises applying a smoothing filter to the raw gain with a smoothing factor that depends on the gain delta: the absolute value of the difference between the raw gain for the current frame and the post-processed gain for a previous frame. The decision-directed gain smoothing comprises converting the raw gain to a signal-to-noise ratio, applying a smoothing filter with a smoothing factor to the signal-to-noise ratio to calculate a smoothed signal-to-noise ratio, and converting the smoothed signal-to-noise ratio to determine the second smoothed gain, with smoothing factor possibly dependent on the gain delta.