Abstract:
A system and method is provided that employs a frequency domain interpolative CODEC system for low bit rate coding of speech which comprises a linear prediction (LP) front end adapted to process an input signal providing LP parameters which are quantized and encoded over predetermined intervals and used to compute a LP residual signal. An open loop pitch estimator adapted to process the LP residual signal, a pitch quantizer, and a pitch interpolator also provides a pitch contour within the predetermined intervals. A voice activity detector adapted to process the LP parameters and the open loop pitch contour over the predetermined intervals is also provided as well as a signal processor responsive to the LP residual signal and the pitch contour and adapted to perform the following functions: extract a prototype waveform (PW) from the LP residual and the open loop pitch contour for a number of equal sub-intervals within the predetermined invervals; normalize the PW by a gain value of the PW; encode a magnitude of the PW; and provide a voicing measure where the voicing measure characterizes a degree of vocing of the input speech signal and is derived from several input parameters that are correlated to degrees of periodicity of the signal over the predetermined intervals. The voicing measure is provided for the purpose of regenerating a PW phase at a decoder; and providing improved quantization of the PW magnitude at an encoder. The voicing measure is encoded jointly with a PW nonstationarity measure vector using a spectrally weighted vector quantizer having a codebook partioned based on a voiced and unvoiced mode.
Abstract:
A drift-free hybrid method of performing video stitching is provided. The method includes decoding a plurality of video bitstreams and storing prediction information. The decoded bitstreams form video images, spatially composed into a combined image. The image comprises frames of ideal stitched video sequence. The method uses prediction information in conjunction with previously generated frames to predict pixel blocks in the next frame. A stitched predicted block in the next frame is subtracted from a corresponding block in a corresponding frame to create a stitched raw residual block. The raw residual block is forward transformed, quantized, entropy encoded and added to the stitched video bitstream along with the prediction information. Also, the stitched raw residual block is inverse transformed and dequantized to create a stitched decoded residual block. The residual block is added to the predicted block to generate the stitched reconstructed block in the next frame of the sequence.
Abstract:
Encoding of prototype waveform components applicable to GeoMobile and Telephony Earth Station (TES) providing improved voice quality enabling a dual-channel mode of operation which permits more users to communicate over the same physical channel. A prototype word (PW) gain is vector quantized using a vector quantizer (VQ) that explicitly populates the codebook by representative steady state and transient vectors of PW gain for tracking the abrupt variations in speech levels during onsets and other non-stationary events, while maintaining the accuracy of the speech level during stationary conditions. The rapidly evolving waveform (REW) and slowly evolving waveform (SEW) component vectors are converted to magnitude-phase. The variable dimension SEW magnitude vector is quantized using a hierarchical approach, i.e., a fixed dimension SEW mean vector computed by a sub-band averaging of SEW magnitude spectrum, and only the REW magnitude is explicitly encoded. The REW magnitude vector sequence is normalized to unity RMS value, resulting in a REW magnitude shape vector and a REW gain vector. The normalized REW magnitude vectors are modeled by a multi-band sub-band model which converts the variable dimension REW magnitude shape vectors, e.g., six dimensional REW sub-band vectors. The sub-band vectors are averaged over time, resulting in a single average REW sub-band vector for each frame. At the decoder, the full-dimension REW magnitude shape vector is obtained from the REW sub-band vector by a piecewise-constant construction. The REW phase vector is regenerated at the decoder based on the received REW gain vector and the voicing measure, which determines a weighted mixture of SEW component and a random noise that is passed through a high pass filter to generate the REW component. The high pass filter poles are adjusted based on the voicing measure to control the REW component characteristics. At the output the filter, the magnitude of the REW component is scaled to match the received REW magnitude vector.
Abstract:
In many applications involving the coding and processing of speech signals the relevant applicable codebook is one which may be termed a sparse codebook. That is, the majority of elements in the codebook are zero valued. The searching of such a sparse codebook is accelerated in accord with the present invention by generating auxiliary information defining the sparse nature of the codebok and using this information to assist and speed up searches of the codebook.In a particular method of searching the calculation of the distance between a target vector and a stored codebook vector is enhanced by use of a distortion metric derived from energy terms and correlation terms of the codebook entries. Calculation of these energy and correlation terms is speeded up by exploiting the sparseness of the codebook entries. The non-zero elements (NZE) of the space codebook are each identified and are defined by their offset from a reference point.
Abstract:
The present invention relates to a device and a method for communicating in a mobile communication system. The method provides a carrier signal having a plurality of frames. Each frame has a plurality of time slots, and each time slot comprises a plurality of transmission bits. A group of time slots are assigned to a communication channel. A traffic burst signal having a plurality of traffic symbols is transmitted over the communication channel by transmitting a first preamble over one of the assigned time slots, and transmitting a second preamble and at least one of the traffic symbols over at least one of the other assigned time slots. The second preamble occupies fewer transmission bits than the first preamble. The apparatus for transmitting a telephony signal over an RF channel includes a modem receiving a digitized PCM telephony signal and producing a traffic burst signal, and a transmitting unit in communication with the modem for transmitting a FDMA/TDMA signal carrying a plurality of traffic burst signals. At least one of the traffic burst signals carries a limited preamble message including a header field and a unique word field and at least one digitized voice message associated with a telephone call. Another traffic burst signal carries at least one signal acquisition message including a unique word field.
Abstract:
The present invention provides a multi-mode CELP encoding and decoding method and device for digitized speech signals providing improvements over prior art codecs and coding methods by selectively utilizes backward prediction for the short-term predictor parameters and fixed codebook gain of a speech signal. In order to achieve these improvements, the present invention provides a coding method comprising the steps of classifying a segment of the digitized speech signal as one of a plurality of predetermined modes, determining a set of unquantized line spectral frequencies to represent the short term predictor parameters for that segment, and quantizing the determined set of unquantized line spectral frequencies using a mode-specific combination of scalar quantization and vector quantization, which utilizes backward prediction for modes with voiced speech signals. Furthermore, backward prediction is selectively applied to the fixed codebook gain in the modes that are free of transients so that it may be used in the fixed codebook search and fixed codebook gain quantization in those modes.
Abstract:
A method, system, and software product for transmitting TTY/TDD signals in a system employing low bit-rate voice compression are disclosed. The method includes receiving an input signal and generating a teletypewriter (TTY) indicator signal from the input signal. Whether or not the input signal is a TTY signal including a TTY character, is determined based on the TTY indicator signal. A TTY packet including the TTY character of the TTY signal is constructed and transmitted if the input signal is determined to be a TTY signal. A method, system, and software product for receiving and decoding TTY/TDD signal is also disclosed.
Abstract:
Encoding of prototype waveform components applicable to telecommunication systems provides improved voice quality enabling a dual-channel mode of operation which permits more users to communicate over the same physical channel. A prototype word (PW) gain is vector quantized using a vector quantizer (VQ) that explicitly populates a codebook by representative steady state and transient vectors of PW gain for tracking the abrupt variations in speech levels during onsets and other non-stationary events, while maintaining the accuracy of the speech level during stationary conditions.
Abstract:
An integrated circuit for processing a speech signal in accordance with a CELP standard includes a plurality of processing elements coupled to a data bus in parallel. Each processing element includes a multiplier and an accumulator. The integrated circuit further includes an auxiliary processing element, which is also coupled to the data bus and has a division unit and a comparator. The plurality of processing elements and the auxiliary processing element are also coupled in a pipeline formation.
Abstract:
A speech mode based multi-stage vector quantizer is disclosed which quantizes and encodes line spectral frequency (LSF) vectors that were obtained by transforming the short-term predictor filter coefficients in a speech codec that utilizes linear predictive techniques. The quantizer includes a mode classifier that classifies each speech frame of a speech signal as being associated with one of a voiced, spectrally stationary (Mode A) speech frame, a voiced, spectrally non-stationary (Mode B) speech frame and an unvoiced (Mode C) speech frame. A converter converts each speech frame of the speech signal into an LSF vector and an LSF vector quantizer includes a 12-bit, two-stage, backward predictive vector encoder that encodes the Mode A speech frames and a 22 bit, four-stage backward predictive vector encoder that encodes the Mode 13 and the Mode C speech frames.