摘要:
In a device configurable to encode speech performing an closed loop re-decision may comprise representing a speech signal by amplitude components and phase components for a current frame and a past frame. In a first closed loop stage, a first set of compressed components and a first set of uncompressed components for a current frame may be generated. A first set of features may be generated by comparing current and past frame amplitude and/or phase components. In a second closed loop stage, a second set of compressed components for the current frame may be generated by compressing the first set of compressed components and compressing the first set of uncompressed components. Generation of a second set of features may be based on the second set of compressed components from the current frame and a combination of amplitude and/or phase components from the past frame.
摘要:
An enhanced blind source separation technique is provided to improve separation of highly correlated signal mixtures. A beamforming algorithm is used to precondition correlated first and second input signals in order to avoid indeterminacy problems typically associated with blind source separation. The beamforming algorithm may apply spatial filters to the first signal and second signal in order to amplify signals from a first direction while attenuating signals from other directions. Such directionality may serve to amplify a desired speech signal in the first signal and attenuate the desired speech signal from the second signal. Blind source separation is then performed on the beamformer output signals to separate the desired speech signal and the ambient noise and reconstruct an estimate of the desired speech signal. To enhance the operation of the beamformer and/or blind source separation, calibration may be performed at one or more stages.
摘要:
Multiple microphone noise suppression apparatus and methods are described herein. The apparatus and methods implement a variety of noise suppression techniques and apparatus that can be selectively applied to signals received using multiple microphones. The microphone signals received at each of the multiple microphones can be independently processed to cancel echo signal components that can be generated from a local audio source. The echo cancelled signals may be processed by some or all modules within a signal separator that operates to separate or otherwise isolate a speech signal from noise signals. The signal separator can include a pre-processing de-correlator followed by a blind source separator. The output of the blind source separator can be post filtered to provide post separation de-correlation. The separated speech and noise signals can be non-linearly processed for further noise reduction, and additional post processing can be implemented following the non-linear processing.
摘要:
This disclosure describes audio mixing techniques that intelligently combine two or more audio signals into an output signal. The techniques allow audio to be combined, yet create perceptual differentiation between the different audio signals. The result is that a user is able to hear both audio signals in a combined output, but the different audio signals do not perceptually interfere with one another. The techniques are relatively simple to implement and are well suited for radio telephones.
摘要:
In a digital system with more than one clock source, lack of synchronization between the clock sources may cause overflow or underflow in sample buffers, also called sample slipping. Sample slipping may lead to undesirable artifacts in the processed signal due to discontinuities introduced by the addition or removal of extra samples. To smooth out discontinuities caused by sample slipping, samples are filtered to when a buffer overflow condition occurs, and the samples are interpolated to produce additional samples when a buffer underflow condition occurs. The interpolated samples may also be filtered. The filtering and interpolation operations can be readily implemented without adding significant burden to the computational complexity of a real-time digital system.
摘要:
Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a predetermined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.
摘要:
An executable is downloaded to an audio output device over a communications link. The executable may configure the audio output device to decode audio encoded in a specified format. The executable may also or alternatively include other audio processing software. The audio may include voice and/or audio playback, e.g., music playback. The ability to download an audio executable allows dynamic provisioning of various decoding and/or audio process capabilities to an audio output device. This may eliminate the need to transcode digitized audio for playback at the audio output device, and may also allow the audio output device to decode multiple audio formats without having multiple audio decoders permanently residing within the audio output device.
摘要:
In a device configurable to encode speech performing an open loop re-decision may comprise representing a speech signal by amplitude components and phase components for a current frame and a past frame. During the current frame, there may be an extraction of uncompressed amplitude components and uncompressed phase components. The amplitude components and the phase components from the past frame may then be retrieved. A set of features may be generated based on the uncompressed amplitude components from the current frame, the uncompressed phase components from the current frame, the amplitude components from the past frame, and the phase components from the past frame. The set of features may be checked as part of the open loop re-decision, and determining a final encoding decision based on the checking may be performed. The final encoding decision may be an encoding mode and/or encoding rate.
摘要:
This disclosure describes signal processing techniques that can improve the performance of blind source separation (BSS) techniques. In particular, the described techniques propose pre-processing steps that can help to de-correlate the different signals from one another prior to execution of the BSS techniques. In addition, the described techniques also propose optional post-processing steps that can further de-correlate the different signals following execution of the BSS techniques. The techniques may be particularly useful for improving BSS performance with highly correlated audio signals, e.g., from two microphones that are in close spatial proximity to one another.
摘要:
This disclosure describes techniques for processing audio files that comply with the musical instrument digital interface (MIDI) format. In particular, various tasks associated with MIDI file processing are delegated between software operating on a general purpose processor, firmware associated with a digital signal processor (DSP), and dedicated hardware that is specifically designed for MIDI file processing. Alternatively, a multi-threaded DSP may be used instead of a general purpose processor and the DSP. In one aspect, this disclosure provides a method comprising parsing MIDI files and scheduling MIDI events associated with the MIDI files using a first process, processing the MIDI events using a second process to generate MIDI synthesis parameters, and generating audio samples using a hardware unit based on the synthesis parameters.