摘要:
There is provided a method of reducing effect of noise producing artifacts in silence areas of a speech signal for use by a speech decoding system. The method comprises obtaining a plurality of incoming samples of a speech subframe; summing an absolute value of an energy level for each of the plurality of incoming samples to generate a total input level (gain_in); smoothing the total input level to generate a smoothed level (Level_in_sm); determining that the speech subframe is in a silence area based on the total input level, the smoothed level and a spectral tilt parameter; defining a gain using k1*(Level_in_sm/1024)+(1−k1), where K1 is a function of the spectral tilt parameter; and modifying an energy level of the speech subframe using the gain.
摘要:
There is provided a method of reducing effect of noise producing artifacts in silence areas of a speech signal for use by a speech decoding system. The method comprises obtaining a plurality of incoming samples of a speech subframe; summing an absolute value of an energy level for each of the plurality of incoming samples to generate a total input level (gain_in); smoothing the total input level to generate a smoothed level (Level_in_sm); determining that the speech subframe is in a silence area based on the total input level, the smoothed level and a spectral tilt parameter; defining a gain using k1*(Level_in_sm/1024)+(1-k1), where K1 is a function of the spectral tilt parameter; and modifying an energy level of the speech subframe using the gain.
摘要:
The invention improves the encoding and decoding of speech by focusing the encoding on the perceptually important characteristics of speech. The system analyzes selected features of an input speech signal, and first performing a common frame based speech coding of an input speech signal. The system then performs a speech coding based on either a first speech coding mode or a second speech coding mode. The selection of a mode is based on characteristics of the input speech signal. The first speech coding mode uses a first framing structure and the second speech coding mode uses a second framing structure.
摘要:
There is provided a method of updating a noise state of a voice activity detector (VAD) for indicating an active voice mode and an inactive voice mode. The method comprises receiving an input signal having a plurality of frames, determining an elapsed time since the last update of the noise state, updating the noise state of the VAD if the elapsed time exceeds a predetermined time, determining an average minimum energy based on two or more of the plurality of frames, determining a current minimum energy based on a current frame of the plurality of frames, updating the noise state of the VAD if the average minimum energy is less than the current minimum energy, and updating the noise state of the VAD if the average minimum energy is greater than the current minimum energy plus a first predetermined value.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
A method and apparatus for encoding speech for communication to a decoder for reproduction of the speech where the speech signal is classified into steady state voiced (harmonic), stationary unvoiced, and “transitory” or “transition” speech, and a particular type of coding scheme is used for each class. Harmonic coding is used for steady state voiced speech, “noise-like” coding is used for stationary unvoiced speech, and a special coding mode is used for transition speech, designed to capture the location, the structure, and the strength of the local time events that characterize the transition portions of the speech. The compression schemes can be applied to the speech signal or to the LP residual signal.
摘要:
Silence description coding for multi-rate speech coding systems that employ discontinued transmission. Speech coding systems include multi-rate speech codecs having an encoder and a decoder. The silence description coding is performed in either the encoder or the decoder of the multi-rate speech codec. It may also be performed in a distributed manner wherein it is performed partially in the encoder and partially in the decoder. The silence description coding is performed on a speech signal having a substantially non-speech-like characteristic. Voice activity detection classifies the speech signal as being either substantially speech-like or substantially non-speech-like. The silence description coding is selected from a plurality of coding modes. In certain embodiments of the invention, the silence description coding is a source coding mode that operates at a bit rate that fits within a bit rate budget as determined by all of the available source coding modes within the plurality of coding modes. The silence description coding is also accompanied with signaling coding and channel coding of the speech signal. Error checking is performed using an unused portion of a bandwidth of the multi-rate speech codec's bit rate. This error checking involves majority voting in certain embodiments of the invention.
摘要:
A method is disclosed for generating frame voicing decisions for an incoming speech signal having periods of active voice and non-active voice for a speech encoder in a speech communication system. The method first extracts a predetermined set of parameters from the incoming speech signal for each frame and then makes a frame voicing decision of the incoming speech signal for each frame according to a set of difference measures extracted from the predetermined set of parameters. The predetermined set of extracted parameters comprises a description of the spectrum of the incoming speech signal based on line spectral frequencies ("LSF"). Additional parameters may include full band energy, low band energy and zero crossing rate. The way to make a frame voicing decision of the incoming speech signal for each frame according to the set of difference measures is by finding a union of sub-spaces with each sub-space being described by a linear function of at least a pair of parameters from the predetermined set of parameters.
摘要:
This invention describes an interactive computerized toy that provides light, audio and vibration entertaining patterns in response to touch stimuli from all directions and where the light patterns may be displayed in all directions. In particular, the present invention describes an interactive computerized toy in the shape of a cube with six faces with a unique versatility in providing endless programming options, in a fashion similar to loading and playing different games on the screens of handheld devices such as smartphones.
摘要:
The invention discloses a multi-stage quantization method, which includes the following steps: obtaining a reference codebook according to a previous stage codebook; obtaining a current stage codebook according to the reference codebook and a scaling factor; and quantizing an input vector by using the current stage codebook. The invention also discloses a multi-stage quantization device. With the invention, the current stage codebook may be obtained according to the previous stage codebook, by using the correlation between the current stage codebook and the previous stage codebook. As a result, it does not require an independent codebook space for the current stage codebook, which saves the storage space and improves the resource usage efficiency.