Abstract:
A speech/audio encoding device for selectively allocating bits for higher precision encoding. The speech/audio encoding device receives a time-domain speech/audio input signal, transforms the speech/audio input signal into a frequency domain, and quantizes an energy envelope corresponding to an energy level for a frequency spectrum of the speech/audio input signal. The speech/audio encoding device further groups quantized energy envelopes into a plurality of groups, determines a perceptual significant group including one or more significant bands and a local-peak frequency, and allocates bits to a plurality of subbands corresponding to the grouped quantized energy envelopes, in which each of the subbands is obtained by splitting the frequency spectrum of the speech/audio input signal. The speech/audio encoding device encodes the frequency spectrum using the bits allocated to the subbands.
Abstract:
A threshold amplitude is calculated for each subband obtained by splitting an extension band. For each subband, an amplitude of transform coefficients is compared with the threshold amplitude to extract a transform coefficient having an amplitude larger than the threshold amplitude as a representative transform coefficient. When a number of the extracted representative transform coefficients is less than a predetermined number, the threshold amplitude is updated in accordance with an amount by which the number of the representative transform coefficients is less than the predetermined number. A transform coefficient is extracted again using the updated threshold amplitude. For each of the subbands, a value of correlation is calculated between the representative transform coefficient and a normalized core encoded low-band transform coefficient. A subband having a largest value of correlation is selected when the number of the extracted representative transform coefficients reaches the predetermined number.
Abstract:
A coding apparatus includes a processor and a memory that stores instructions, which when executed causes the processor to perform operations, including encoding a first band of an input audio signal to be a first spectrum and dividing the first spectrum into a plurality of sub-bands. The operations also include searching a largest amplitude value of the divided first spectrum in each of the plurality of sub-bands, and normalizing the divided first spectrum in each of the plurality of sub-bands. The operations further include emphasizing a harmonic structure in the normalized first spectrum, and searching a best band that has a largest correlation value between each divided band of a second band spectrum and the emphasized first spectrum in which the harmonic structure is emphasized, and encoding the second band spectrum using lag information identifying the best band and transmitting the lag information to a decoder side.
Abstract:
A speech/audio coding apparatus includes a receiver that receives a time-domain speech input signal. The apparatus also includes a processor that transforms a time-domain speech input signal into a frequency-domain spectrum, and divides a frequency region of the spectrum in an extended band into a plurality of bands. The processor sets a limited band for each divided band in the current frame, a width of the limited band in the current frame being narrower than the divided band and the limited band including a first frequency. The processor further encodes the spectrum in the limited band within each divided band in the current frame, wherein the width of the limited band is predetermined and is set to 31.
Abstract:
An audio/speech encoding method is provided that includes transforming a time domain input signal to a frequency spectrum, and dividing the frequency spectrum to a plural of bands. The method also includes calculating a level of energies for each band, quantizing the energies for the each band, and calculating differential indices. The method additionally includes modifying a range of the differential indices for the Nth band when N is an integer of 2 or more, and replacing the differential index with the modified differential index, and not modifying a range of the differential indices for the Nth band when N is an integer of 1. The method further includes encoding the differential indices using a Huffman table selected based on a minimum value and a maximum value of the differential indices, and transmitting the encoded differential indices and a flag signal for indicating the selected Huffman table.
Abstract:
Provided are a voice audio encoding device, voice audio decoding device, voice audio encoding method, and voice audio decoding method that efficiently perform bit distribution and improve sound quality. Dominant frequency band identification unit identifies a dominant frequency band having a norm factor value that is the maximum value within the spectrum of an input voice audio signal. Dominant group determination units and non-dominant group determination unit group all sub-bands into a dominant group that contains the dominant frequency band and a non-dominant group that contains no dominant frequency band. Group bit distribution unit distributes bits to each group on the basis of the energy and norm variance of each group. Sub-band bit distribution unit redistributes the bits that have been distributed to each group to each sub-band in accordance with the ratio of the norm to the energy of the groups.
Abstract:
A coding apparatus, including a processor that performs operations including encoding a first band of an input audio signal to be a first spectrum, dividing the first spectrum into a plurality of subbands, at equal intervals each including a predetermined number of samples for flattening the first spectrum, searching a largest amplitude value of the divided first spectrum in each of the subbands, normalizing the divided first spectrum with the largest amplitude values searched in each of the subbands, searching best bands among each normalized divided first spectrum which has a largest correlation value between each divided band of a second band spectrum and each normalized divided first spectrum, the second spectrum being higher than a predetermined frequency, and encoding the second spectrum using lag information identifying the best bands for transmitting the lag information to a decoder side.
Abstract:
A coding apparatus encodes a first band of an input audio signal, normalizes a first spectrum included in each sub-band of the first band using a spectrum power envelope, performs a clipping process on the normalized first spectrum, the clipping process comparing between a predetermined threshold and the absolute value of an amplitude of the spectrum and replaces the amplitude value of the spectrum with the threshold if the absolute value of the amplitude of the spectrum exceeds the threshold, calculates a correlation between a spectrum in each divided band of a second band and a spectrum in a plurality of candidate bands containing the clipped normalized first spectrum, the second spectrum being higher than a predetermined frequency, identifies the best bands of the plurality of candidate bands, and encodes the second spectrum using lag information identifying the best band for transmitting the lag information to a decoder.
Abstract:
A speech/audio coding apparatus is provided that includes a receiver that receives a time-domain speech input signal and a processor. The processor transforms a time-domain speech input signal into a frequency-domain spectrum, and divides a frequency region of the spectrum in an extended band into a plurality of bands. The processor also sets a limited band for each divided band in the current frame, when a difference between a first frequency with a first maximum amplitude in a spectrum of the divided band in a preceding frame and a second frequency with a second maximum amplitude in a spectrum of the divided band in a current frame is below a threshold. The processor further encodes the spectrum in the limited band within each divided band in the current frame, and does not encode a spectrum outside the limited band within each divided band in the current frame.
Abstract:
A coding apparatus normalizes a low-frequency spectrum included in each of sub-bands obtained from dividing a low band part, using a largest amplitude value among the low-frequency spectrum included in each sub-band, obtains a normalized low-frequency spectrum by decoding the first encoded data, and calculates a correlation between each divided band of a high-frequency spectrum and a plurality of candidate bands of the normalized low-frequency spectrum. The best bands of a plurality of candidate bands are identified, each candidate band having a starting frequency position with non-zero amplitude in the normalized low-frequency spectrum, the high-frequency spectrum being in a high band part of the input audio signal that is higher than the predetermined frequency, and the high-frequency spectrum is encoded using lag information identifying the best band for transmitting the lag information to a decoder.