摘要:
Speech frames of a first speech coding scheme are utilized as speech frames of a second speech coding scheme, where the speech coding schemes use similar core compression schemes for the speech frames, preferably bit stream compatible. An occurrence of a state mismatch in an energy parameter between the first speech coding scheme and the second speech coding scheme is identified, preferably either by determining an occurrence of a predetermined speech evolution, such as a speech type transition, e.g. an onset of speech following a period of speech inactivity, or by tentative decoding of the energy parameter in the two encoding schemes followed by a comparison. Subsequently, the energy parameter in at least one frame of the second speech coding scheme following the occurrence of the state mismatch is adjusted. The present invention also presents transcoders and communications systems providing such transcoding functionality.
摘要:
Disclosed is an adaptive sound source vector quantization device capable of improving quantization accuracy of adaptive sound source vector quantization while suppressing increase of the calculation amount in CELP sound encoding which performs encoding in sub-frame unit. In the device, a search adaptive sound source vector generation unit (103) cuts out an adaptive sound source vector of a frame length (n) from an adaptive sound source codebook (102), a search impulse response matrix generation unit (105) generates a search impulse response matrix of n n by using an impulse response matrix for each of sub-frames inputted from a synthesis filter (104), a search target vector generation unit (106) adds the target vector of each sub-frame so as to generate a search target vector of frame length (n), an evaluation scale calculation unit (107); calculates the evaluation scale of the adaptive sound source vector quantization by using the search adaptive sound source vector, the search impulse response matrix, and the search target vector.
摘要:
A coding device is provided with features in which optimum coding in a higher layer is flexibly carried out based on a coding result of a lower layer and a quality audio signal in limited circumstances is served to users. In this coding device, a basic layer coding unit codes an input signal to generate a basic layer information source code and outputs a linear prediction coefficient (LPC) and a quantum LPC, which are parameters calculated at coding, to an expanded layer control unit. A basic layer decoding unit decodes the basic layer information source code. An adding unit reverses a polarity of a basic layer decoded signal, adds the same to the input signal, and calculates a difference signal. The expanded layer control unit generates expanded layer mode information indicative of a coding mode in an expanded layer based on the LPC and the quantum LPC. An expanded layer coding unit codes the difference signal obtained from the adding unit under control of the expanded layer control unit.
摘要:
A speech communication system provides a speech encoder that generates a set of coded parameters representative of the desired speech signal characteristics. The speech communication system also provides a speech decoder that receives the set of coded parameters to generate reconstructed speech. The speech decoder includes an equalizer that computes a matching set of parameters from the reconstructed speech generated by the speech decoder, undoes the set of characteristics corresponding to the computed set of parameters, and imposes the set of characteristics corresponding to the coded set of parameters, thereby producing equalized reconstructed speech.
摘要:
The present invention relates to a data processing apparatus capable of obtaining high-quality sound data. A tap generation section 121 generates a prediction tap used for a process in a prediction section 125 by extracting decoded speech data in a predetermined positional relationship with subject data of interest within the decoded speech data such that coded data is decoded by a CELP method and by extracting an I code located in a subframe according to a position of the subject data in the subject subframe. Similarly to the tap generation section 122, a tap generation section 122 generates a class tap used for a process in a classification section 123. The classification section 123 performs classification on the basis of the class tap, and a coefficient memory 124 outputs a tap coefficient corresponding to the classification result. The prediction section 125 performs a linear prediction computation by using the prediction tap and the tap coefficient and outputs high-quality decoded speech data. The present invention can be applied to mobile phones for transmitting and receiving speech.
摘要:
A method of tracking formants defines a formant search space comprising sets of formants to be searched. Formants are identified for a first frame in the speech utterance by searching the entirety of the formant search space using the codebook, and for the remaining frames by searching the same space using both the codebook and the continuity constraint across adjacent frames. Under one embodiment, the formants are identified by mapping sets of formants into feature vectors and applying the feature vectors to a model. Formants are also identified by applying dynamic programming to search for the best sequence that optimally satisfies the continuity constraint required by the model.
摘要:
A process that allocates bits for quantizing spectral components in a perceptual coding system is performed more efficiently by obtaining an accurate estimate of the optimal value for one or more coding parameters that are used in the bit allocation process. In one implementation for a perceptual audio coding system, an accurate estimate of an offset from a calculated psychoacoustic masking curve is derived by selecting an initial value for the offset, calculating the number of bits that would be allocated if the initial offset were used for coding, and estimating the optimum value of the offset from a difference between this calculated number and the number of bits that are actually available for allocation.
摘要:
Hybrid linear predictive speech coding system with phase alignment predictive quantization zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.
摘要:
Coding a signal is provided, wherein a first set of values is provided related to subsequent times in a first time interval of the signal, a second set of values is provided related to subsequent times in a second time interval of the signal, the first time interval having an overlap with the second time interval, the overlap including at least two subsequent times of the second interval, wherein at least one of the values of the second set related to the at least two subsequent times in the overlap is encoded with reference to a value of the first set which is closer in time to the at least one value of the second set than any other value in the second set.
摘要:
The invention relates to a computer device comprising a memory 108 for storing audio signals 114, in part pre-recorded, each corresponding to a defined source, by means of spatial position data 116, and a processing module 110 for processing these audio signals in real time as a function of the spatial position data. The processing module 110 allows for the instantaneous power level parameters to be calculated on the basis of audio signals 114, the corresponding sources being defined by instantaneous power level parameters. The processing module 110 comprises a selection module 120 for regrouping certain of the audio signals into a variable number of audio signal groups, and the processing module 110 is capable of calculating spatial position data which is representative of a group of audio signals as a function of the spatial position data 116 and instantaneous power level parameters for each corresponding source.