摘要:
Systems and methods are described that utilize dynamic time scale modification (TSM) to achieve reduced bit rate audio coding. In accordance with embodiments, different levels of TSM compression are selectively applied to segments of an input speech signal prior to encoding thereof by an encoder. Encoded TSM-compressed segments are received at a decoder which decodes such segments and then applies an appropriate level of TSM decompression to each based on information received from the encoder. By selectively applying different levels of TSM compression to segments of an input speech signal prior to encoding, a coding bit rate associated with the encoder/decoder is reduced. Furthermore, by selecting a level of TSM compression for each segment of the input speech signal that takes into account certain local characteristics of that signal, such bit rate reduction is provided without introducing unacceptable levels of distortion into an output speech signal produced by the decoder.
摘要:
Systems and methods are described that utilize dynamic time scale modification (TSM) to achieve reduced bit rate audio coding. In accordance with embodiments, different levels of TSM compression are selectively applied to segments of an input speech signal prior to encoding thereof by an encoder. Encoded TSM-compressed segments are received at a decoder which decodes such segments and then applies an appropriate level of TSM decompression to each based on information received from the encoder. By selectively applying different levels of TSM compression to segments of an input speech signal prior to encoding, a coding bit rate associated with the encoder/decoder is reduced. Furthermore, by selecting a level of TSM compression for each segment of the input speech signal that takes into account certain local characteristics of that signal, such bit rate reduction is provided without introducing unacceptable levels of distortion into an output speech signal produced by the decoder.
摘要:
A technique is described for concealing the effect of a lost frame in a series of frames representing an encoded audio signal in a sub-band predictive coding system. In accordance with the technique, a first synthesized sub-band audio signal is synthesized, wherein synthesizing the first synthesized sub-band audio signal comprises performing waveform extrapolation based on a stored first sub-band decoded audio signal. A second synthesized sub-band audio signal is also synthesized, wherein synthesizing the second synthesized sub-band audio signal comprises performing waveform extrapolation based on the stored second sub-band decoded audio signal. The first synthesized sub-band audio signal and the second synthesized sub-band audio signal are combined to generate a synthesized full-band output audio signal corresponding to a lost frame.
摘要:
Systems and methods are described for performing packet loss concealment using an extrapolation of an excitation waveform in a sub-band predictive speech coder, such as an ITU-T Recommendation G.722 wideband speech coder. The systems and methods are useful for concealing the quality-degrading effects of packet loss in a sub-band predictive coder and address some sub-band architectural issues when applying excitation extrapolation techniques to such sub-band predictive coders.
摘要:
An audio decoding system performs frame loss concealment (FLC) when portions of a bit stream representing an audio signal are lost within the context of a digital communication system. The audio decoding system employs two different FLC methods: one designed to perform well for music, and the other designed to perform well for speech. When a frame is deemed lost, the audio decoding system analyzes a previously-decoded audio signal corresponding to previously-decoded frames of an audio bit-stream. Based on the results of the analysis, the lost frame is classified as either speech or music. Using this classification, other signal analysis, and knowledge of the employed FLC methods, the audio decoding system selects the appropriate FLC method which then performs FLC on the lost frame.
摘要:
A technique is described herein for reducing audible artifacts in an audio output signal generated by decoding a received frame in a series of frames representing an encoded audio signal in a predictive coding system. In accordance with the technique, it is determined if the received frame is one of a predefined number of received frames that follow a lost frame in the series of the frames. Responsive to determining that the received frame is one of the predefined number of received frames, at least one parameter or signal associated with the decoding of the received frame is altered from a state associated with normal decoding. The received frame is then decoded in accordance with the at least one parameter or signal to generate a decoded audio signal. The audio output signal is then generated based on the decoded audio signal.
摘要:
A technique is described for use in a decoder configured to decode a series of frames representing an encoded audio signal. The technique is for transitioning between a lost frame and one or more received frames following the lost frame in the series of frames. In accordance with the technique, an output audio signal associated with the lost frame is synthesized. An extrapolated signal is generated based on the synthesized output audio signal. A time lag is calculated between the extrapolated signal and a decoded audio signal associated with the received frame(s), wherein the time lag represents a phase difference between the extrapolated signal and the decoded audio signal. The decoded audio signal is time-warped based on the time lag, wherein time-warping the decoded audio signal comprises stretching or shrinking the decoded audio signal in the time domain.
摘要:
A technique is described for use in a decoder configured to decode a series of frames representing an encoded audio signal. The technique is for transitioning between a lost frame and one or more received frames following the lost frame in the series of frames. In accordance with the technique, an output audio signal associated with the lost frame is synthesized. An extrapolated signal is generated based on the synthesized output audio signal. A time lag is calculated between the extrapolated signal and a decoded audio signal associated with the received frame(s), wherein the time lag represents a phase difference between the extrapolated signal and the decoded audio signal. The decoded audio signal is time-warped based on the time lag, wherein time-warping the decoded audio signal comprises stretching or shrinking the decoded audio signal in the time domain.
摘要:
A technique is described herein for updating a state of a decoder configured to decode a series of frames representing an encoded audio signal. In accordance with the technique, an output audio signal associated with a lost frame in the series of frames is synthesized. The decoder state is set to align with the synthesized output audio signal at a frame boundary. An extrapolated signal is generated based on the synthesized output audio signal. A time lag is calculated between the extrapolated signal and a decoded audio signal associated with a first received frame after the lost frame in the series of frames, wherein the time lag represents a phase difference between the extrapolated signal and the decoded audio signal. The decoder state is then reset based on the time lag.
摘要:
A technique is described herein for reducing audible artifacts in an audio output signal generated by decoding a received frame in a series of frames representing an encoded audio signal in a predictive coding system. In accordance with the technique, it is determined if the received frame is one of a predefined number of received frames that follow a lost frame in the series of the frames. Responsive to determining that the received frame is one of the predefined number of received frames, at least one parameter or signal associated with the decoding of the received frame is altered from a state associated with normal decoding. The received frame is then decoded in accordance with the at least one parameter or signal to generate a decoded audio signal. The audio output signal is then generated based on the decoded audio signal.