Abstract:
According to the invention, the manipulation of audio data can be simplified, such as, for example, with relation to the combination of individual audio channels to give multi-channel audio data streams, or for the general manipulation of an audio data stream, whereby a data block is modified (56) in an audio data stream (10), divided into data blocks (10a, 10b) with determining blocks (14, 16) and data block audio data (18) such as, for example, by inclusion in, addition to, or replacement of a part thereof, itself containing a length indicator which expresses a data amount or length of the block audio data, or a data amount or length of the data block, such as to give a second audio data stream with modified data blocks. Alternatively, an audio data stream (10) with pointers in determining blocks (14, 10), which point to the determining block audio data (44, 46), allocated to the determining blocks, but distributed in various data blocks, is converted into an audio stream, whereby the determining block audio data (44, 46) are combined to give coherent determining block audio data (48). The coherent determining block audio data (48) can be contained with the corresponding determining block (14, 16) in a self-contained channel element (52a).
Abstract:
Die Handhabung mit Audiodaten kann erleichtert werden, wie z.B. im Hinblick auf die Zusammenfassung einzelner Audiodatenströme zu Mehrkanal-Audiodatenströmen oder die Handhabung eines Audiodatenstroms allgemein, indem in einem Audiodatenstrom (10), der in Datenblöcke (10a, 10b) mit Bestimmungsblock (14, 16) und Datenblockaudiodaten (18) gegliedert ist, ein Datenblock modifiziert (56) wird, wie z.B. durch Ergänzung bzw. Hinzufügung oder durch Ersetzung eines Teils desselben, damit derselbe eine Längenangabeenthält, die eine Datenmenge bzw. Länge der Datenblockaudiodaten oder eine Datenmenge bzw. Länge desDatenblocks angibt, um einen zweiten Audiodatenstrom mit modifizierten Datenblöcken zu erhalten. Oder es wird ein Audiodatenstrom (10) mit Zeigern in Bestimmungsblöcken (14, 10), die auf die den Bestimmungsblöcken zugeordneten aber in verschiedene Datenblöcke verteilten Bestimmungsblockaudiodaten (44, 46) zeigen, in einen Audiodatenstrom überführt, bei dem die Bestimmungsblockaudiodaten (44, 46) zuzusammenhängenden Bestimmungsblockaudiodaten (48) zusammengefasst sind. Die zusammenhängenden Bestimmungsblockaudiodaten (48) können dann zusammen mit ihrem Bestimmungsblock (14, 16) in einem in sich abgeschlossenem Kanalelement (52a) enthalten sein.
Abstract:
Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the "stable voiced" classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals. Otherwise, the frame is likely to contain a non-stationary speech segment such as a voiced onset or rapidly evolving voiced speech signal. In this case a general-purpose speech coder is used at a high bit rate for sustaining good subjective quality .
Abstract:
Vector quantization techniques reduce the effective bit rate to 600 bps while maintaining intelligible speech. Four frames of speech are combined into one frame. The system uses mixed excitation linear prediction speech model parameters to quantized the frame and achieve a fixed rate of 600 bps. The system allows voice communication over bandwidth constrained channels.
Abstract:
An apparatus and method for mapping CELP parameters between a source codec and a destination codec. The apparatus includes an LSP mapping module, an adaptive codebook mapping module coupled to the LSP mapping module, and a fixed codebook mapping module coupled to the LSP mapping module and the adaptive codebook mapping module. The LSP mapping module includes an LP overflow module and an LSP parameter modification module. The adaptive codebook mapping module includes a first pitch gain codebook. The fixed codebook mapping module includes a first target processing module, a pulse search module, a fixed codebook gain estimation module, a pulse position searching module.
Abstract:
Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the "stable voiced" classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals. Otherwise, the frame is likely to contain a non-stationary speech segment such as a voiced onset or rapidly evolving voiced speech signal. In this case a general-purpose speech coder is used at a high bit rate for sustaining good subjective quality .
Abstract:
There is provided transcoding of speech in a packet network environment. A decoder configured to receive a first bit-stream encoded according to a first coding scheme. The decoder decodes the bit-stream according to the first coding scheme, generates a plurality of first speech samples, and extracts a plurality of first speech parameters, which may include spectral characteristics, energy, pitch and/or pitch gain. A converter then converts the plurality first speech samples and plurality of first speech parameters to a plurality of second speech samples and a plurality of second speech parameters for use according to a second coding scheme. The first and second coding schemes may be, for example, G.711, G.723.1, G.726 or G.729, and may be parametric or non-parametric. An encoder receives the plurality of second speech samples and plurality of second speech parameters and generates a second bit-stream according to the second coding scheme.
Abstract:
Transliteration architectures reduce the number of encoding/decoding steps required to transmit telephony data. The reduction of encoding/decoding steps improves the quality of the transmitted data due to the avoidance of the significant adverse effects on the data from encoding and decoding. The reduction is accomplished using a transliteration device or through bypassing the transliteration device. A universal vocoder is proposed that allows the vocoding element to encode or decode data according to any desired vocoder format. Network routing considerations allow optimal decisions on which vocoder formats to use. Network routing decisions can be bases on vocoder formats used.
Abstract:
The disclosed embodiments provide a method and apparatus for interoperability between CTX and DTX communications systems during transmissions of silence or background noise. Continuous eight rate encoded noise frames are translated to discontinuous SID frames for transmission to DTX systems (402-410). Discontinuous SID frames are translated to continuous eight rate encoded noise frames for decoding by a CTX system (602-606). Applications of CTX to DTX interoperability comprise CDMA and GSM interoperability (narrowband voice transmission systems), CDMA next generation vocoder (The Selectable Mode Vocoder) interoperability with the new ITU-T 4 kbps vocoder operating in DTX-mode for Voice Over IP applications, future voice transmission systems that have a common speech encoder/decoder but operate in differing CTX or DTX modes during speech non-activity, and CDMA wideband voice transmission system interoperability with other wideband voice transmission systems with common wideband vocoders but with different modes of operation (DTX or CTX) during voice non-activity).
Abstract:
A method and apparatus for CELP-based to CELP-based vocoder packet translation. The apparatus includes a formant parameter translator and an excitation parameter translator. The formant parameter translator includes a model order converter and a time base converter. The method includes the steps of translating the formant filter coefficients of the input packet from the input CELP format to the output CELP format and translating the pitch and codebook parameters of the input speech packet from the input CELP format to the output CELP format. The step of translating the formant filter coefficients includes the steps of converting the model order of the formant filter coefficients from the model order of the input CELP format to the model order of the output CELP format and converting the time base of the resulting coefficients from the input CELP format time base to the output CELP format time base.