Abstract:
A pitch estimating method and device, and a pitch estimating program for estimating the weight of the probability density function of the fundamental frequency and the amplitudes of the harmonic components with operations less than conventional. In the improved pitch estimating method, 1200log 2 h and exp[-(x-(F+1200log 2 h)) 2 /2W 2 ] of equation 121 is computed in advance. [Eq. 121] (61) The computation of eq. 121 is executed only for the fundamental frequency F at which x-(F+1200log 2 h) is close to 0, and the result is stored in a memory of the computer. With this, the operations can be made much less than conventional, and the computation time can be shortened.
Abstract:
A method (Fig. 9) and apparatus (500, 600) for prediction in a speech-coding system extends a 1st order long-term predictor (LTP) filter, using a sub-sample resolution delay, to a multi-tap LTP filter (504, 604). From another perspective, a conventional integer-sample resolution multi-tap LTP filter is extended to use sub-sample resolution delay. Such a multi-tap LTP filter offers a number of advantages over the prior-art. Particularly, defining the lag with sub-sample resolution makes it possible to explicitly model the delay values that have a fractional component, within the limits of resolution of the over-sampling factor used by the interpolation filter. The coefficients (ßi's) of the multi-tap LTP filter are thus largely freed from modeling the effect of delays that have a fractional component. Consequently their main function is to maximize the prediction gain of the LTP filter via modeling the degree of periodicity that is present and by imposing spectral shaping.
Abstract:
A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame (903). If the frame is an even numbered frame and a voiced class, a codeword of first length is calculated by absolutely quantizing the frame pitch (910). If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch (905). If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.
Abstract:
Methods and apparatus for detecting periodicity and/or for determining the fundamental period of a signal such as speech. The methods include (104) embedding a portion of a sampled digitized signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors (106), selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors (108), accumulating total numbers of selected closest pairs of vectors having the same time separation values to produce a histogram of accumulated numbers, and (110) locating at least a highest peak in a portion of said histogram to obtain a value indicating the fundamental period of the signal. Various embodiments are directed to speech and audio signal processing and other speech related applications. However, the methods have a general nature and can be applied to other types of periodic or quasi-periodic signals as well.
Abstract:
The invention provides system, apparatus, and method for compressing a speech signal by decimating or removing somewhat redundant portions of the signal while retaining reference signal portions sufficient to reconstruct the signal (170) without noticeable loss in quality, thereby permitting a storage and transmission of high quality speech with minimal storage volume or transmission bandwidth requirements. Speech pitch waveform decimation is used to reduce data to produce an encoded speech signal during compression (162), and time based interpolative speech reconstruction is used on the encoded signal to reconstruct the original speech signal (160). In another aspect an internet (180) voice electronic mail system (174) is provided which has minimal voice message storage and transmission requirements while retaining high fidelity voice quality.
Abstract:
A speech classification technique (502-530) for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques. A speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements. Highly accurate speech classification produces a lower average encoded bit rate, and higher quality decoded speech. The speech classifier considers a maximum number of parameters for each frame of speech, producing numerous and accurate speech mode classifications for each frame. The speech classifier correctly classifies numerous modes of speech under varying environmental conditions. The speech classifier inputs classification parameters from external components, generates internal classification parameters from the input parameters, sets a Normalized Auto-correlation Coefficient Function threshold and selects a parameter analyzer according to the signal environment, and then analyzes the parameters to produce a speech mode classification.
Abstract:
A speech compression system with a special fixed codebook structure and a new search routine is proposed for speech coding. The system is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech. The codebook structure uses a plurality of subcodebooks. Each subcodebook is designed to fit a specific group of speech signals. A criterion value is calculated for each subcodebook to minimize an error signal in a minimization loop as part of the coding system. An external signal sets a maximum bitstream rate for delivering encoded speech into a communications system. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. Each codec is selectively activated to encode and decode the speech signals at different bit rates to enhance overall quality of the synthesized speech at a limited average bit rate.
Abstract:
A frame erasure compensation method in a variable-rate speech coder includes quantizing, with a first encoder, a pitch lag value for a current frame and a first delta pitch lag value equal to the difference between the pitch lag value for the current frame and the pitch lag value for the previous frame. A second, predictive encoder quantizes only a second delta pitch lag value for the previous frame (equal to the difference between the pitch lag value for the previous frame and the pitch lag value for the frame prior to that frame). If the frame prior to the previous frame is processed as a frame erasure, the pitch lag value for the previous frame is obtained by subtracting the first delta pitch lag value from the pitch lag value for the current frame. The pitch lag value for the erasure frame is then obtained by subtracting the second delta pitch lag value from the pitch lag value for the previous frame. Additionally, a waveform interpolation method may be used to smooth discontinuities caused by changes in the coder pitch memory.
Abstract:
A basilar membrane model is used to receive an input signal including a target signal in step I. With successive further steps the target signal is filtered from the input signal. After the filtering the target signal can be used as an input for further processing, like for example signal recognition of data compression. The target signal can also be applied to a substantially reverse method to obtain an improved or clean signal.