摘要:
Apparatus is provided that includes at least one entity for transmitting speech signals in a discontinuous transmission mode including transmitting speech frames interspersed with frames including comfort noise parameters during periods of speech pauses. The entit(ies) include a first entity for estimating a current noise value. In addition, the apparatus includes a second entity for selectively controlling a rate at which the frames including comfort noise parameters are transmitted during the periods of speech pauses based upon the estimated current noise value.
摘要:
Apparatus is provided that includes at least one entity for transmitting speech signals in a discontinuous transmission mode including transmitting speech frames interspersed with frames including comfort noise parameters during periods of speech pauses. The entit(ies) include a first entity for estimating a current noise value. In addition, the apparatus includes a second entity for selectively controlling a rate at which the frames including comfort noise parameters are transmitted during the periods of speech pauses based upon the estimated current noise value.
摘要:
The present invention relates to a method and device for improving concealment of frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder (106) to a decoder (110), and for accelerating recovery of the decoder after non erased frames of the encoded sound signal have been received. For that purpose, concealment/recovery parameters are determined in the encoder or decoder. When determined in the encoder (106), the concealment/recovery parameters are transmitted to the decoder (110). In the decoder, erasure frame concealment and decoder recovery is conducted in response to the concealment/recovery parameters. The concealment/recovery parameters may be selected from the group consisting of: a signal classification parameter, an energy information parameter and a phase information parameter. The determination of the concealment/recovery parameters comprises classifying the successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset, and this classification is determined on the basis of at least a part of the following parameters: a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
摘要:
A method and device for concealing frame erasures caused by frames of an encoded sound signal erased during transmission from an encoder to a decoder and for recovery of the decoder after frame erasures comprise, in the encoder, determining concealment/recovery parameters including at least phase information related to frames of the encoded sound signal. The concealment/recovery parameters determined in the encoder are transmitted to the decoder and, in the decoder, frame erasure concealment is conducted in response to the received concealment/recovery parameters. The frame erasure concealment comprises resynchronizing, in response to the received phase information, the erasure-concealed frames with corresponding frames of the sound signal encoded at the encoder. When no concealment/recovery parameters are transmitted to the decoder, a phase information of each frame of the encoded sound signal that has been erased during transmission from the encoder to the decoder is estimated in the decoder. Also, frame erasure concealment is conducted in the decoder in response to the estimated phase information, wherein the frame erasure concealment comprises resynchronizing, in response to the estimated phase information, each erasure-concealed frame with a corresponding frame of the sound signal encoded at the encoder.
摘要:
The present invention relates to a method and device for improving concealment of frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder (106) to a decoder (110), and for accelerating recovery of the decoder after non erased frames of the encoded sound signal have been received. For that purpose, concealment/recovery parameters are determined in the encoder or decoder. When determined in the encoder (106), the concealment/recovery parameters are transmitted to the decoder (110). In the decoder, erasure frame concealment and decoder recovery is conducted in response to the concealment/recovery parameters. The concealment/recovery parameters may be selected from the group consisting of: a signal classification parameter, an energy information parameter and a phase information parameter. The determination of the concealment/recovery parameters comprises classifying the successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset, and this classification is determined on the basis of at least a part of the following parameters: a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
摘要:
For determining a long-term-prediction delay parameter characterizing a long term prediction in a technique using signal modification for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, a feature of the sound signal is located in a previous frame, a corresponding feature of the sound signal is located in a current frame, and the long-term-prediction delay parameter is determined for the current frame while mapping, with the long term prediction, the signal feature of the previous frame with the corresponding signal feature of the current frame. In a signal modification method for implementation into a technique for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, each frame of the sound signal is partitioned into a plurality of signal segments, and at least a part of the signal segments of the frame are warped while constraining the warped signal segments inside the frame. For searching pitch pulses in a sound signal, a residual signal is produced by filtering the sound signal through a linear prediction analysis filter, a weighted sound signal is produced by processing the sound signal through a weighting filter, the weighted sound signal being indicative of signal periodicity, a synthesized weighted sound signal is produced by filtering a synthesized speech signal produced during a last subframe of a previous frame of the sound signal through the weighting filter, a last pitch pulse of the sound signal of the previous frame is located from the residual signal, a pitch pulse prototype of given length is extracted around the position of the last pitch pulse of the sound signal of the previous frame using the synthesized weighted sound signal, and the pitch pulses are located in a current frame using the pitch pulse prototype.
摘要:
Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the “stable voiced” classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals. Otherwise, the frame is likely to contain a non-stationary speech segment such as a voiced onset or rapidly evolving voiced speech signal. In this case a general-purpose speech coder is used at a high bit rate for sustaining good subjective quality.
摘要:
In one aspect thereof the invention provides a method for noise suppression of a speech signal that includes, for a speech signal having a frequency domain representation dividable into a plurality of frequency bins, determining a value of a scaling gain for at least some of said frequency bins and calculating smoothed scaling gain values. Calculating smoothed scaling gain values includes, for the at least some of the frequency bins, combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain. In another aspect a method partitions the plurality of frequency bins into a first set of contiguous frequency bins and a second set of contiguous frequency bins having a boundary frequency there between, where the boundary frequency differentiates between noise suppression techniques, and changes a value of the boundary frequency as a function of the spectral content of the speech signal.
摘要:
In the method and device for interoperating a first station using a first communication scheme and comprising a first coder and a first decoder with a second station using a second communication scheme and comprising a second coder and a second decoder, communication between the first and second stations is conducted by transmitting signal-coding parameters related to a sound signal from the coder of one of the first and second stations to the decoder of the other station. The sound signal is classified to determine whether the signal-coding parameters should be transmitted from the coder of one station to the decoder of the other station using a first communication mode in which full bit rate is used for transmission of the signal-coding parameters. When classification of the sound signal determines that the signal-coding parameters should be transmitted using the first communication mode and when a request to transmit the signal-coding parameters from the coder of one station to the decoder of the other station using a second communication mode designed to reduce bit rate during transmission of the signal-coding parameters is received, a portion of the signal-coding parameters from the coder one station is dropped and the remaining signal-coding parameters are transmitting to the decoder of the other station using the second communication mode. The dropped portion of the signal-coding parameters are regenerated before the decoder of the other station decodes the signal-coding parameters.
摘要:
The exemplary embodiments of the invention provide at least a method and an apparatus to perform operations including dividing a sound signal into a series of successive frames, dividing each frame into a number of subframes, producing a residual signal by filtering the sound signal through a linear prediction analysis filter, locating a last pitch pulse of the sound signal of a previous frame from the residual signal, extracting a pitch pulse prototype of given length around a position of the last pitch pulse of the previous frame using the residual signal, and locating pitch pulses in a current frame using the pitch pulse prototype.