摘要:
A pitch lag coding device and method using interframe correlation inherent in pitch lag values to reduce coding bit requirements. A pitch lag value is extracted for a given speech frame, and then refined for each subframe. For every speech frame having N samples of speech, LPC analysis and vector quantization are performed for the whole coding frame. The LPC residual obtained for each frame is then processed such that pitch lag values for all subframes within the coding frame are analyzed concurrently. The remaining coding parameters, i.e., the codebook search, gain parameters, and excitation signal, are then analyzed sequentially according to their respective subframes.
摘要:
A pitch lag coding device and method using interframe correlation inherent in pitch lag values to reduce coding bit requirements. A pitch lag value is extracted for a given speech frame, and then refined for each subframe. For every speech frame having N samples of speech, LPC analysis and vector quantization are performed for the whole coding frame. The LPC residual obtained for each frame is then processed such that pitch lag values for all subframes within the coding frame are analyzed concurrently. The remaining coding parameters, i.e., the codebook search, gain parameters, and excitation signal, are then analyzed sequentially according to their respective subframes.
摘要:
Provided is a method and computer program product for producing an enhanced audio signal for an output device from audio signals received by 2 or more microphones in close proximity to each other. For example, one embodiment of the present invention comprises the steps of receiving a first input audio signal from the first microphone, digitizing the first input audio signal to produce a first digitized audio input signal, receiving a second input audio input signal from the second microphone, digitizing the second input audio input signal to produce a second digitized audio input signal, using the first digitized audio input signal as a reference signal to an adaptive prediction filter, using the second digitized audio input signal as input to said adaptive prediction filter and finally adding a prediction result signal from the adaptive prediction filter to the first digitized audio input signal to produce the enhanced audio signal. In other embodiments, any number of microphones can be used, and in all embodiments there is no requirement to detect or locate the source or direction of arrival of the input audio signals.
摘要:
There is provided a method of detecting and reporting poor voice quality for use by a gateway device. The method comprises facilitating a connection between a telephone and a remote telephone via a network, and detecting a poor voice quality indictor during the connection. The method further comprises capturing, for a pre-determined period of time, telephone voice data being exchanged between the gateway and the telephone, network voice data being exchanged between the gateway and the network, and gateway parameters. The method also comprises packetizing the telephone voice data, the network voice data and the gateway parameters into a plurality packets having a network address of a network storage, and transmitting the plurality packets destined for the network storage via the network. In one aspect, the poor voice quality indictor may be generated by a user of the telephone in response to a poor voice quality of the connection.
摘要:
There is provided a method of selecting a pitch lag value for a portion of a speech signal, the method comprising: computing a weighted correlation function of the portion of the speech signal for a range of delay times, wherein the weighting of the correlation function depends on both the delay time and a characteristic of one or more previous portions of the speech signal; and selecting the pitch lag value based on a delay time from the range of delay times that maximizes the weighted correlation function.
摘要:
There is provided a method of selecting a pitch lag value from a plurality of pitch lag candidates for coding a speech signal. The method comprises identifying the plurality of pitch lag candidates from a frame of the speech signal using correlation; classifying the speech signal to obtain a voice classification; determining whether one or more of the plurality of pitch lag candidates are in a temporal neighborhood of one or more previous pitch lag values; favoring the one or more of the plurality of pitch lag candidates determined to be in the temporal neighborhood of the one or more previous pitch lag values, by adaptive weighting, over other ones of the plurality of pitch lag candidates; and selecting the pitch lag value based on the voice classification and the one or more of the plurality of pitch lag candidates favored by the adaptive weighting.
摘要:
A multi-channel speech processor for encoding speech in a packet network environment is disclosed. In one illustrative aspect, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of encoding which is used by a signal processing unit (SPU) to convert the speech signal into packet data. In general, the CRM determines the level of complexity of encoding based on a calculated complexity budget, where the complexity budget is determined based on the time required to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor through its ability to signal the SPU to encode speech signal in a complexity reduced mode based on the calculated complexity budget under certain conditions.
摘要:
There are provided speech coding methods and systems for estimating a plurality of speech parameters of a speech signal for coding the speech signal using one of a plurality of speech coding algorithms, the plurality of speech parameters includes pitch information, the plurality of speech parameters is calculated using a plurality of thresholds. An example method includes estimating a background noise level in the speech signal to determine a signal to noise ratio (SNR) for the speech signal, adjusting one or more of the plurality of thresholds based on the SNR to generate one or more SNR adjusted thresholds, analyzing the speech signal to extract the pitch information using the one or more SNR adjusted thresholds, and repeating the estimating, the adjusting and the analyzing to code the speech signal using one the plurality of speech coding algorithms.
摘要:
A flexible variable rate vocoder and related method of operation. The vocoder selects a target average data rate responsive to at least one network parameter and at least one external parameter.
摘要:
A fully backward compatible intelligent discontinued transmission (DTX) and comfort noise generation (CNG) scheme that is operable in pulse code modulation (PCM) speech coding systems. The scheme, for example, provides a speech encoder comprising a speech signal analysis circuitry configured to calculates a predetermined plurality of parameters from the speech signal, a voice activity detector configured to determine voice activity in the speech signal, where the speech encoder enters a discontinued transmission mode of the voice activity detector does not detect voice activity, and a transmitter configured to transmit one or more speech samples of the speech signal after the speech encoder enters the discontinued transmission mode, where the one or more speech samples are capable of use by a remote speech decoder to extract a parameter from the one or more speech samples in order generate a background noise base on the parameter.