摘要:
Systems, methods, and apparatus described include waveform alignment operations in which a single set of evaluated cosines and sines is used to calculate cross-correlations of two periodic waveforms at two different phase shifts.
摘要:
Methods and apparatus are provided for achieving an arbitrary average data rate for a variable rate coder. One method includes selecting a set (e.g., a pair) of initial composite rates surrounding the arbitrary average data rate. A reallocation fraction is then calculated based on the initial composite rates. The reallocation fraction is used to reassign a number of frames from one component rate of an initial composite rate to another in order to achieve the arbitrary average data rate. Such a method may be configured such that selecting an initial composite rate on one side of (e.g., less than) the arbitrary average data rate implicitly selects the initial composite rate on the other side of the arbitrary average data rate.
摘要:
A method and apparatus for subsampling phase spectrum information includes a speech coder for analyzing and reconstructing a prototype of a frame by using intelligent subsampling of phase spectrum information of the prototype. To analyze the prototype, the speech coder produces a phase parameters of a reference prototype, generates phase parameters of a current prototype, and correlates the phase parameters of the current prototype with the phase parameters of the reference prototype in multiple frequency bands. To reconstruct the prototype using linear phase shift values, the speech coder produces a phase parameters of the reference prototype, generates a set of linear phase shift values associated with the prototype, and composes a phase vector from the phase parameters and the linear phase shift values across multiple frequency bands. To reconstruct the prototype using circular rotation values, the speech coder produces a set of circular rotation values associated with the prototype, generates a set of bandpass waveforms in multiple frequency bands, the bandpass waveforms being associated with the phase parameters of the reference prototype, and modifes the bandpass waveforms based upon the circular rotation values.
摘要:
A method and apparatus for interleaving line spectral information quantization methods in a speech coder includes quantizing line spectral information with two vector quantization techniques, the first technique being a non-moving-average prediction-based technique, and the second technique being a moving-average prediction-based technique. A line spectral information vector is vector quantized with the first technique. Equivalent moving average codevectors for the first technique are computed. A memory of a moving average codebook of codevectors is updated with the equivalent moving average codevectors for a predefined number of frames that were previously processed by the speech coder. A target quantization vector for the second technique is calculated based on the updated moving average codebook memory. The target quantization vector is vector quantized with the second technique to generate a quantized target codevector. The memory of the moving average codebook is updated with the quantized target codevector. Quantized line spectral information vectors are derived from the quantized target codevector.
摘要:
The disclosure describes FGS video coding techniques that use cycle-aligned fragments (CAFs). The techniques may perform cycle-based coding of FGS video data block coefficients and syntax elements, and encapsulate cycles in fragments for transmission. The fragments may be cycle-aligned such that a start of a payload of each of the fragments substantially coincides with a start of one of the cycles. In this manner, cycles can be readily accessed via individual fragments. Some cycles may be controlled with a vector mode to scan to a predefined position within a block before moving to another block. In this manner, the number of cycles can be reduced, reducing the number of fragments and associated overhead. The CAFs may be entropy coded independently of one another so that each fragment may be readily accessed and decoded without waiting for decoding of other fragments. Independent entropy coding may permit parallel decoding and simultaneous processing of fragments.
摘要:
Methods and apparatus are presented for reducing the number of bits needed to represent an excitation waveform. An acoustic signal in an analysis frame is analyzed to determine whether it is a band-limited signal. A sub-sampled sparse codebook is used to generate the excitation waveform if the acoustic signal is a band-limited signal. The sub-sampled sparse codebook is generated by decimating permissible pulse locations from the codebook track in accordance with the frequency characteristic of the acoustic signal.
摘要:
A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.
摘要:
A method and apparatus for predictively quantizing voiced speech includes a parameter generator and a quantizer. The parameter generator is configured to extract parameters from frames of predictive speech such as voiced speech, and to transform the extracted information to a frequency-domain representation. The quantizer is configured to subtract a weighted sum of the parameters for previous frames from the parameter for the current frame. The quantizer is configured to quantize the difference value. A prototype extractor may be added to first extract a pitch period prototype to be processed by the parameter generator.
摘要:
Techniques for complexity-adaptive and automatic two-dimensional (2D) to three-dimensional (3D) image and video conversion which classifies a frame of a 2D input into one of a flat image class and a non-flat image class are described. The flat image class frame is directly converted into 3D stereo for display. The frame that is classified as a non-flat image class is further processed automatically and adaptively, based on complexity, to create a depth map estimate. Thereafter, the non-flat image class frame is converted into a 3D stereo image using the depth map estimate or an adjusted depth map. The adjusted depth map is processed based on the complexity.
摘要:
A monoscopic low-power mobile device is capable of creating real-time stereo images and videos from a single captured view. The device uses statistics from an autofocusing process to create a block depth map of a single capture view. Artifacts in the block depth map are reduced and an image depth map is created. Stereo three-dimensional (3D) left and right views are created from the image depth map using a Z-buffer based 3D surface recover process and a disparity map which is a function of the geometry of binocular vision.