摘要:
A method and apparatus for coding a quasi-periodic speech signal. The speech signal is represented by a residual signal generated by filtering the speech signal with a Linear Predictive Coding (LPC) analysis filter. The residual signal is encoded by extracting a prototype period from a current frame of the residual signal. A first set of parameters is calculated which describes how to modify a previous prototype period to approximate the current prototype period. One or more codevectors are selected which, when summed, approximate the error between the current prototype period and the modified previous prototype. A multi-stage codebook is used to encode this error signal. A second set of parameters describe these selected codevectors. The decoder synthesizes an output speech signal by reconstructing a current prototype period based on the first and second set of parameters, and the previous reconstructed prototype period. The residual signal is then interpolated over the region between the current and previous reconstructed prototype periods. The decoder synthesizes output speech based on the interpolated residual signal.
摘要:
A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.
摘要:
A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.
摘要:
A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.
摘要:
A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.
摘要:
An adaptive Intra-refresh (IR) technique for digital video encoding adjusts IR rate based on video content, or a combination of video content and channel condition. The IR rate may be applied at the frame level or macroblock (MB) level. At the frame level, the IR rate specifies the percentage of MBs to be Intra-coded within the frame. At the MB level, the IR rate defines a statistical probability that a particular MB is to be Intra-coded. The IR rate is adjusted in proportion to a combined metric that weighs estimated channel loss probability, frame-to-frame variation, and texture information. The IR rate can be determined using a close-form solution that requires relatively low implementation complexity. For example, such a close-form does not require iteration or an exhaustive search. In addition, the IR rate can be determined from parameters that are available before motion estimation and compensation are performed.
摘要:
A stereo 3D video frame includes left and right components that are combined to produce a stereo image. For a given amount of distortion, the left and right components may have different impacts on perceptual visual quality of the stereo image due to asymmetry in the distortion response of the human eye. A 3D video encoder adjusts an allocation of coding bits between left and right components of the 3D video based on a frame-level bit budget and a weighting between the left and right components. The video encoder may generate the bit allocation in the rho (ρ) domain. The weighted bit allocation may be derived based on a quality metric that indicates overall quality produced by the left and right components. The weighted bit allocation compensates for the asymmetric distortion response to reduce overall perceptual distortion in the stereo image and thereby enhance or maintain visual quality.
摘要:
Error concealment is used to hide the effects of errors detected within digital video information. A complex error concealment mode decision is disclosed to determine whether spatial error concealment (SEC) or temporal error concealment (TEC) should be used. The error concealment mode decision system uses different methods depending on whether the damaged frame is an intra-frame or an inter-frame. If the video frame is an intra-frame then a similarity metric is used to determine if the intra-frame represents a scene-change or not. If the video frame is an intra-frame, a complex multi-termed equation is used to determine whether SEC or TEC should be used. A novel spatial error concealment technique is disclosed for use when the error concealment mode decision determines that spatial error concealment should be used for reconstruction. The novel spatial error concealment technique divides a corrupt macroblock into four different regions, a corner region, a row adjacent to the corner region, a column adjacent to the corner region, and a remainder main region. Those regions are then reconstructed in that order and information from earlier reconstructed regions may be used in later reconstructed regions. Finally, a macroblock refreshment technique is disclosed for preventing error propagation from harming non-corrupt inter-blocks. Specifically, an inter-macroblock may be ‘refreshed’ using spatial error concealment if there has been significant error caused damage that may cause the inter-block to propagate the errors.
摘要:
Systems, methods, and apparatus described include waveform alignment operations in which a single set of evaluated cosines and sines is used to calculate cross-correlations of two periodic waveforms at two different phase shifts.
摘要:
Methods and apparatus are presented for determining the type of acoustic signal and the type of frequency spectrum exhibited by the acoustic signal in order to selectively delete parameter information before vector quantization. The bits that would otherwise be allocated to the deleted parameters can then be re-allocated to the quantization of the remaining parameters, which results in an improvement of the perceptual quality of the synthesized acoustic signal. Alternatively, the bits that would have been allocated to the deleted parameters are dropped, resulting in an overall bit-rate reduction.