摘要:
Encoding techniques and devices are based on a sinusoidal speech representation model. In one aspect of the invention, a pitch-adaptive channel encoding technique for amplitude coding varies the channel spacing in accordance with the pitch of the speaker's voice. In another aspect of the invention, a phase synthesis technique locks rapidly-varying phases into synchrony with the phase of the fundamental. Phase coding techniques which introduce a voice-dependent random phase and a pitch-adaptive quadratic phase dispersion are also performed.
摘要:
A sinusoidal model for acoustic waveforms is applied to develop a new analysis/synthesis technique which characterizes a waveform by the amplitudes, frequencies, and phases of component sine waves. These parameters are estimated from a short-time Fourier transform. Rapid changes in the highly-resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. The component values are interpolated from one frame to the next to yield a respresentation that is applied to a sine wave generator. The resulting synthetic waveform preserves the general waveform shape and is perceptually indistinguishable from the original. Furthermore, in the presence of noise the perceptual characteristics of the waveform as well as the noise are maintained. The method and devices are particularly useful in speech coding, time-scale modification, frequency scale modification and pitch modification.
摘要:
A sinusoidal model for acoustic waveforms is applied to develop a new analysis/synthesis technique which characterizes a waveform by the amplitudes, frequencies, and phases of component sine waves. These parameters are estimated from a short-time Fourier transform. Rapid changes in the highly-resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. The component values are interpolated from one frame to the next to yield a representation that is applied to a sine wave generator. The resulting synthetic waveform preserves the general waveform shape and is perceptually indistinguishable from the original. Furthermore, in the presence of noise the perceptual characteristics of the waveform as well as the noise are maintained. The method and devices are particularly useful in speech coding, time-scale modification, frequency scale modification and pitch modification.
摘要:
A lower threshold for dynamic range compression and clipping is allowed by sinusoidal estimation and phase adjustment of the original speech signal to obtain a lower Peak to RMS ratio. A sinusoidal speech representation system is applied to the problem of speech dispersion by pre-processing the waveform prior to transmission to reduce the peak-to-RMS ratio of the waveform. The sinusoidal system first estimates and then removes the natural phase dispersion in the frequency components of the speech signal. Artificial dispersion based on pulse compression techniques is then introduced with little change in speech quality. The new phase dispersion allocation serves to preprocess the waveform prior to dynamic range compression and clipping, allowing considerably deeper thresholding than can be tolerated on the original waveform.
摘要:
A system and method for processing of audio and speech signals is disclosed, which provide compatibility over a range of communication devices operating at different sampling frequencies and/or bit rates. The analyzer of the system divides the input signal in different portions, at least one of which carries information sufficient to provide intelligible reconstruction of the input signal. The analyzer also encodes separate information about other portions of the signal in an embedded manner, so that a smooth transition can be achieved from low bit-rate to high bit-rate applications. Accordingly, communication devices operating at different sampling rates and/or bit-rates can extract corresponding information from the output bit stream of the analyzer. In the present invention embedded information generally relates to separate parameters of the input signal, or to additional resolution in the transmission of original signal parameters. Non-linear techniques for enhancing the overall performance of the system are also disclosed. Also disclosed is a novel method of improving the quantization of signal parameters. In a specific embodiment the input signal is processed in two or more modes dependent on the state of the signal in a frame. When the signal is determined to be in a transition state, the encoder provides phase information about N sinusoids, which the decoder end uses to improve the quality of the output signal at low bit rates.
摘要:
Methods and apparatus for reducing discontinuities between frames of sinusoidally modeled acoustic waveforms, such as speech, which occur when sampling at low frame rates. A Fast Fourier Transform-based overlap-add technique is applied to amplitude, frequency and phase components of sinusoidal waves after frame-to-frame sine wave matching has been performed. Matched sine wave amplitudes and frequencies are linearly interpolated and mid-point phase is estimated such that the mid-frame sine wave is best fit to the most recent half-frame segments of the lagging and leading sine waves. Synthetic mid-frame sine waves are generated using the interpolated amplitude and frequency and estimated phase values. Synthesized acoustic waveforms of high quality from original source waveforms can be produced in sinusoidal analysis/synthesis operations at coding frame rates of 50 Hz and lower. The methods and devices disclosed herein are particularly useful for computationally efficient coding and synthesis of speech waveforms.
摘要:
A system and method for processing of audio and speech signals is disclosed, which provide compatibility over a range of communication devices operating at different sampling frequencies and/or bit rates. The analyzer of the system divides the input signal in different portions, at least one of which carries information sufficient to provide intelligible reconstruction of the input signal. The analyzer also encodes separate information about other portions of the signal in an embedded manner, so that a smooth transition can be achieved from low bit-rate to high bit-rate applications. Accordingly, communication devices operating at different sampling rates and/or bit-rates can extract corresponding information from the output bit stream of the analyzer. In the present invention embedded information generally relates to separate parameters of the input signal, or to additional resolution in the transmission of original signal parameters. Non-linear techniques for enhancing the overall performance of the system are also disclosed. Also disclosed is a novel method of improving the quantization of signal parameters. In a specific embodiment the input signal is processed in two or more modes dependent on the state of the signal in a frame. When the signal is determined to be in a transition state, the encoder provides phase information about N sinusoids, which the decoder end uses to improve the quality of the output signal at low bit rates.