摘要:
A modified synchronized overlap add (SOLA) algorithm for performing high-quality, low-complexity audio time scale modification (TSM) is described. The algorithm produces good output audio quality with a very low complexity and without producing additional audible distortion during dynamic change of the audio playback speed. The algorithm may achieve complexity reduction by performing the maximization of normalized cross-correlation using decimated signals. By updating the input buffer and the output buffer in a precise sequence with careful checking of the appropriate array bounds, the algorithm may also achieve seamless audio playback during dynamic speed change with a minimal requirement on memory usage.
摘要:
A technique is described for concealing the effect of a lost frame in a series of frames representing an encoded audio signal in a sub-band predictive coding system. In accordance with the technique, a first synthesized sub-band audio signal is synthesized, wherein synthesizing the first synthesized sub-band audio signal comprises performing waveform extrapolation based on a stored first sub-band decoded audio signal. A second synthesized sub-band audio signal is also synthesized, wherein synthesizing the second synthesized sub-band audio signal comprises performing waveform extrapolation based on the stored second sub-band decoded audio signal. The first synthesized sub-band audio signal and the second synthesized sub-band audio signal are combined to generate a synthesized full-band output audio signal corresponding to a lost frame.
摘要:
A technique is described herein for updating a state of a decoder configured to decode a series of frames representing an encoded audio signal. In accordance with the technique, an output audio signal associated with a lost frame in the series of frames is synthesized. The decoder state is set to align with the synthesized output audio signal at a frame boundary. An extrapolated signal is generated based on the synthesized output audio signal. A time lag is calculated between the extrapolated signal and a decoded audio signal associated with a first received frame after the lost frame in the series of frames, wherein the time lag represents a phase difference between the extrapolated signal and the decoded audio signal. The decoder state is then reset based on the time lag.
摘要:
A technique is described herein for reducing audible artifacts in an audio output signal generated by decoding a received frame in a series of frames representing an encoded audio signal in a predictive coding system. In accordance with the technique, it is determined if the received frame is one of a predefined number of received frames that follow a lost frame in the series of the frames. Responsive to determining that the received frame is one of the predefined number of received frames, at least one parameter or signal associated with the decoding of the received frame is altered from a state associated with normal decoding. The received frame is then decoded in accordance with the at least one parameter or signal to generate a decoded audio signal. The audio output signal is then generated based on the decoded audio signal.
摘要:
A technique for concealing the effect of a lost frame in a series of frames representing an encoded audio signal in a sub-band predictive coding system is provided. In accordance with the technique, one or more received frames in the series of frames are decoded to generate a full-band output audio signal, wherein the full-band output audio signal comprises a combination of at least a first sub-band decoded audio signal and a second sub-band decoded audio signal. The full-band output audio signal corresponding to the one or more received frames is stored. Then, a full-band output audio signal corresponding to the lost frame is synthesized, wherein synthesizing the full-band output audio signal corresponding to the lost frame comprises performing waveform extrapolation based on the stored full-band output audio signal corresponding to the one or more received frames.
摘要:
Systems and methods are described for performing packet loss concealment using an extrapolation of an excitation waveform in a sub-band predictive speech coder, such as an ITU-T Recommendation G.722 wideband speech coder. The systems and methods are useful for concealing the quality-degrading effects of packet loss in a sub-band predictive coder and address some sub-band architectural issues when applying excitation extrapolation techniques to such sub-band predictive coders.
摘要:
A method and system for refining an estimated pitch period estimate based on a coarse pitch useful for performing frame loss concealment in an audio decoder as well as for other applications. A normalized correlation at the coarse pitch lag is computed and used as the current best candidate. The normalized correlation is then evaluated at the midpoint of the refinement pitch range on either side of the current best candidate. If the normalized correlation at either midpoint is greater than the current best lag, the midpoint with the maximum correlation is selected as the current best lag. After each iteration, the refinement range is decreased by a factor of two and centered on the current best lag. This bisectional search continues until the pitch has been refined to an acceptable tolerance or until the refinement range has been exhausted. During each step of the bisectional pitch refinement, the signal is decimated to reduce the complexity of computing the normalized correlation.
摘要:
Systems, methods and apparatuses are described for deriving and updating user attribute information about users of a communications system. A communications network is then used to transfer the user attribute information to communication terminals, which use the user attribute information to configure a speech codec to operate in a speaker-dependent manner during a communication session, thereby improving speech coding efficiency. In a network-assisted model, the user attribute information is stored on the communications network and selectively transmitted to the communication terminals while in a peer-assisted model, the user attribute information is derived by and transferred between communication terminals.
摘要:
A communications network is used to transfer user attribute information about participants in a communication session to their respective communication terminals for storage and use thereon to configure a speech codec to operate in a speaker-dependent manner, thereby improving speech coding efficiency. In a network-assisted model, the user attribute information is stored on the communications network and selectively transmitted to the communication terminals while in a peer-assisted model, the user attribute information is derived by and transferred between communication terminals.
摘要:
A bit error concealment (BEC) system and method is described herein that detects and conceals the presence of click-like artifacts in an audio signal caused by bit errors introduced during transmission of the audio signal within an audio communications system. A particular embodiment of the present invention utilizes a low-complexity design that introduces no added delay and that is particularly well-suited for applications such as Bluetooth® wireless audio devices which have low cost and low power dissipation requirements.