摘要:
A method comprises analyzing each frame of a plurality of frames of the speech signal to determine one or more speech parameters for the speech signal; deciding, for each frame of the plurality of frames of the speech signal, based on the one or more speech parameters of the speech signal, to select one of a plurality of encoding modes including a first encoding mode and a second encoding mode for encoding each frame of the plurality of frames of the speech signal; encoding each frame of the plurality of frames of the speech signal according to the selected one of the plurality of encoding modes for each frame of the plurality of frames in the deciding; the first encoding mode supports a first encoding rate and the second encoding mode supports a second encoding rate, wherein the first encoding rate is the same encoding rate as the encoding rate.
摘要:
In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. A speech encoder employing various encoding schemes based upon parameters including an available transmission bit rate. In addition, the speech encoder is operable to identify and apply an optimal encoding scheme for a given speech signal. The speech encoder may be applied code-excited linear prediction when the available bit rate is above a predetermined upper threshold. Pitch preprocessing, including continuous warping, may be applied when it is below a predetermined lower threshold. The encoder considers varying characteristics of the speech signal including the long term prediction mode of a previous frame, and a spectral difference between the line spectral frequencies of a current and a previous frame, a predicted pitch lag, an open loop pitch lag, a closed loop pitch lag, a pitch gain, and a pitch correlation.
摘要:
Provided is a method and computer program product for producing an enhanced audio signal for an output device from audio signals received by 2 or more microphones in close proximity to each other. For example, one embodiment of the present invention comprises the steps of receiving a first input audio signal from the first microphone, digitizing the first input audio signal to produce a first digitized audio input signal, receiving a second input audio input signal from the second microphone, digitizing the second input audio input signal to produce a second digitized audio input signal, using the first digitized audio input signal as a reference signal to an adaptive prediction filter, using the second digitized audio input signal as input to said adaptive prediction filter and finally adding a prediction result signal from the adaptive prediction filter to the first digitized audio input signal to produce the enhanced audio signal. In other embodiments, any number of microphones can be used, and in all embodiments there is no requirement to detect or locate the source or direction of arrival of the input audio signals.
摘要:
There is provided a method of detecting and reporting poor voice quality for use by a gateway device. The method comprises facilitating a connection between a telephone and a remote telephone via a network, and detecting a poor voice quality indictor during the connection. The method further comprises capturing, for a pre-determined period of time, telephone voice data being exchanged between the gateway and the telephone, network voice data being exchanged between the gateway and the network, and gateway parameters. The method also comprises packetizing the telephone voice data, the network voice data and the gateway parameters into a plurality packets having a network address of a network storage, and transmitting the plurality packets destined for the network storage via the network. In one aspect, the poor voice quality indictor may be generated by a user of the telephone in response to a poor voice quality of the connection.
摘要:
A multi-channel speech processor for encoding speech in a packet network environment is disclosed. In one illustrative aspect, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of encoding which is used by a signal processing unit (SPU) to convert the speech signal into packet data. In general, the CRM determines the level of complexity of encoding based on a calculated complexity budget, where the complexity budget is determined based on the time required to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor through its ability to signal the SPU to encode speech signal in a complexity reduced mode based on the calculated complexity budget under certain conditions.
摘要:
There are provided speech coding methods and systems for estimating a plurality of speech parameters of a speech signal for coding the speech signal using one of a plurality of speech coding algorithms, the plurality of speech parameters includes pitch information, the plurality of speech parameters is calculated using a plurality of thresholds. An example method includes estimating a background noise level in the speech signal to determine a signal to noise ratio (SNR) for the speech signal, adjusting one or more of the plurality of thresholds based on the SNR to generate one or more SNR adjusted thresholds, analyzing the speech signal to extract the pitch information using the one or more SNR adjusted thresholds, and repeating the estimating, the adjusting and the analyzing to code the speech signal using one the plurality of speech coding algorithms.
摘要:
A flexible variable rate vocoder and related method of operation. The vocoder selects a target average data rate responsive to at least one network parameter and at least one external parameter.
摘要:
A fully backward compatible intelligent discontinued transmission (DTX) and comfort noise generation (CNG) scheme that is operable in pulse code modulation (PCM) speech coding systems. The scheme, for example, provides a speech encoder comprising a speech signal analysis circuitry configured to calculates a predetermined plurality of parameters from the speech signal, a voice activity detector configured to determine voice activity in the speech signal, where the speech encoder enters a discontinued transmission mode of the voice activity detector does not detect voice activity, and a transmitter configured to transmit one or more speech samples of the speech signal after the speech encoder enters the discontinued transmission mode, where the one or more speech samples are capable of use by a remote speech decoder to extract a parameter from the one or more speech samples in order generate a background noise base on the parameter.
摘要:
A speech encoding comb codebook structure for providing good quality reproduced low bit-rate speech signals in a speech encoding system. The codebook structure requires minimal training, if any, and allows for reduced complexity and memory requirements. The codebook includes a first and at least one additional sub-codebooks, each having a plurality of code-vectors. The codebook may be randomly populated. All even elements may be set to zero in a first codebook, and all odd elements may be set to zero on a second codebook. The resulting comb codebook includes code-vector combination of the code-vectors from the sub-codebooks. In certain embodiments, the code-vectors of the sub-codebooks may contain zero valued elements. In other embodiments where the code-vectors of the sub-codebooks contain only non-zero elements, zero valued elements may be inserted in between the non-zero elements of the sub-codebooks during the forming of the resultant comb codebook. In such an embodiment, the memory requirements would be further reduced in that the zero valued elements need not be stored.