摘要:
A technique is described herein for reducing audible artifacts in an audio output signal generated by decoding a received frame in a series of frames representing an encoded audio signal in a predictive coding system. In accordance with the technique, it is determined if the received frame is one of a predefined number of received frames that follow a lost frame in the series of the frames. Responsive to determining that the received frame is one of the predefined number of received frames, at least one parameter or signal associated with the decoding of the received frame is altered from a state associated with normal decoding. The received frame is then decoded in accordance with the at least one parameter or signal to generate a decoded audio signal. The audio output signal is then generated based on the decoded audio signal.
摘要:
A system and method for performing speaker localization is described. The system and method utilizes speaker recognition to provide an estimate of the direction of arrival (DOA) of speech sound waves emanating from a desired speaker with respect to a microphone array included in the system. Candidate DOA estimates may be preselected or generated by one or more other DOA estimation techniques. The system and method is suited to support steerable beamforming as well as other applications that utilize or benefit from DOA estimation. The system and method provides robust performance even in systems and devices having small microphone arrays and thus may advantageously be implemented to steer a beamformer in a cellular telephone or other mobile telephony terminal featuring a speakerphone mode.
摘要:
A technique is described herein for reducing audible artifacts in an audio output signal generated by decoding a received frame in a series of frames representing an encoded audio signal in a predictive coding system. In accordance with the technique, it is determined if the received frame is one of a predefined number of received frames that follow a lost frame in the series of the frames. Responsive to determining that the received frame is one of the predefined number of received frames, at least one parameter or signal associated with the decoding of the received frame is altered from a state associated with normal decoding. The received frame is then decoded in accordance with the at least one parameter or signal to generate a decoded audio signal. The audio output signal is then generated based on the decoded audio signal.
摘要:
A communication device receives an encoded signal from a primary input source. The encoded signal includes periods of speech and periods of non-speech. The communication device includes a decoder to decode the received signal to produce a decoded signal. A detector of the communication device detects the periods of speech and the periods of non-speech within the decoded signal. A controller of the communication device provides the decoded signal to an output of the communication device during the periods of speech. The controller interrupts the decoded signal during the periods of non-speech and provides an alternate input from a secondary input source to the communication device.
摘要:
A noise feedback coding (NFC) system and method that utilizes a simple and relatively inexpensive general structural configuration, but achieves improved flexibility with respect to controlling the shape of coding noise. The NFC system and method utilizes an all-zero noise feedback filter that is configured to approximate the response of a pole-zero noise feedback filter.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To support lower bit rate encoding modes, a variety of techniques are applied many of which involve the classification of the input signal. The encoder utilizes gain normalization wherein LPC (linear predictive coding) gain provides a smoothing factor for combining both open and closed loop gains. The lower the LPC gain, the greater the open loop gain contribution to a gain normalization factor. The greater the LPC gain, the greater the closed loop gain contribution. For background noise, the smaller of the closed and open loop gains are used as the normalization factor. The normalization factor is limited by the LPC gain to prevent influencing the coding quality.
摘要:
A method of coding speech under background noise conditions or during noise-like speech periods wherein during active voice speech segments an analysis-by-synthesis method is used. However, when a background noise segment or noise-like speech segment is detected, an adaptive code book (pitch prediction) contribution is used as a source of a pseudo-random sequence in order to provide a better representation of the background noise or the noise-like speech. An improved gain quantization scheme is also employed when a background noise segment is detected, wherein energy of the total excitation with quantized gains is matched to the energy of total excitation with unquantized gains.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. The encoder produces a series of LSF (line spectral frequencies) vectors. For filter stability, each LSF vector comprises an ascending sequence of LSF values. Occasionally, pairs of LSF values are produced (or become through an introduction of channel error) out of ascending order. In response, the encoder performs frame erasure, LSF concealment or pair flipping. With a relatively large number of out of order pairs, frame erasure is applied. With a single out of order pair, within the LSF vector, the pair are flipped. Likewise, with two pairs out of order, the previous LSF vector is used to generate the current LSF vector using concealment. The performance of frame erasure, LSF concealment or pair flipping is also performed in the decoder as well as in the encoder in some embodiments.
摘要:
A technique for suppressing non-stationary noise, such as wind noise, in an audio signal is described. In accordance with the technique, a series of frames of the audio signal is analyzed to detect whether the audio signal comprises non-stationary noise. If it is detected that the audio signal comprises non-stationary noise, a number of steps are performed. In accordance with these steps, a determination is made as to whether a frame of the audio signal comprises non-stationary noise or speech and non-stationary noise. If it is determined that the frame comprises non-stationary noise, a first filter is applied to the frame and if it is determined that the frame comprises speech and non-stationary noise, a second filter is applied to the frame.