Prototype waveform phase modeling for a frequency domain interpolative speech codec system

    公开(公告)号:US06931373B1

    公开(公告)日:2005-08-16

    申请号:US10073423

    申请日:2002-02-13

    CPC classification number: G10L19/08 G10L19/097

    Abstract: A system and method is provided that employs a frequency domain interpolative CODEC system for low bit rate coding of speech which comprises a linear prediction (LP) front end adapted to process an input signal that provides LP parameters which are quantized and encoded over predetermined intervals and used to compute a LP residual signal. An open loop pitch estimator adapted to process the LP residual signal, a pitch quantizer, and a pitch interpolator and provide a pitch contour within the predetermined intervals is also provided. Also provided is a signal processor responsive to the LP residual signal and the pitch contour and adapted to perform the following: provide a voicing measure, where the voicing measure characterizes a degree of voicing of the input speech signal and is derived from several input parameters that are correlated to degrees of periodicity of the signal over the predetermined intervals; extract a prototype waveform (PW) from the LP residual and the open loop pitch contour for a number of equal sub-intervals within the predetermined intervals; normalize the PW by a gain value of the PW; encode a magnitude of the PW; and separate stationary and nonstationary components of the PW using a low complexity alignment process and a filtering process that introduce no delay. The ratio of the energy of the nonstationary component of the PW to that of the stationary component of the PW is averaged across 5 subbands to compute the nonstationarity measure as a frequency dependent vector entity. A measure of the degree of voicing of the residual is also computed using openloop pitchgain, pitch variance, relative signal power, PW correlation and PW nonstationarity in low frequency subbands. The nonstationarity measure and voicing measure are encoded using a 6-bit spectrally weighted vector quantization scheme using a codebook partitioned based on a voiced/unvoiced decision. At the decoder, a stationary component of PW is reconstructed as a weighted combination of the previous PW phase vector, a random phase perturbation and a fixed phase vector obtained from a voiced pitch pulse.

    Robust pitch estimation method and device for telephone speech
    6.
    发明授权
    Robust pitch estimation method and device for telephone speech 失效
    稳健的音调估计方法和电话语音设备

    公开(公告)号:US5704000A

    公开(公告)日:1997-12-30

    申请号:US337595

    申请日:1994-11-10

    CPC classification number: G10L25/90

    Abstract: A pitch estimating method includes the steps of (1) determining a set of pitch candidates to estimate a pitch of a digitized speech signal at each of a plurality of time instants, wherein series of these time instants define segments of the digitized speech signal; (2) constructing a pitch contour using a pitch candidate selected from each of the sets of pitch candidates determined in the first step; and (3) selecting a representative pitch estimate for the digitized speech signal segment from the set of pitch candidates comprising the pitch contour.

    Abstract translation: 音调估计方法包括以下步骤:(1)确定一组音调候选以估计多个时刻中的每个时刻的数字化语音信号的音高,其中这些时刻的系列定义数字化语音信号的片段; (2)使用从第一步骤中确定的每个音调候选集中选择的音调候选来构造音调轮廓; 以及(3)从包括音调轮廓的音调候选集合中选择用于数字化语音信号段的代表音调估计。

    Robust vector quantization of line spectral frequencies
    7.
    发明授权
    Robust vector quantization of line spectral frequencies 失效
    线谱频率的鲁棒矢量量化

    公开(公告)号:US5651026A

    公开(公告)日:1997-07-22

    申请号:US495148

    申请日:1995-06-27

    CPC classification number: G10L19/07 G06Q10/0875 G10L25/24

    Abstract: A line spectral frequency (LSF) vector quantizer, having particular application in digital cellular networks (DCN), is provided for code excited linear predictive (CELP) speech encoders. The LSF vector quantizer is efficient in terms of bits employed, robust and effective in terms of performance across speakers and handsets, moderate in terms of complexity, and accommodates effective and simple built-in transmission error detection schemes. The LSF vector quantizer employs a minimum number of bits, is of moderate complexity and incorporates built-in error detection capability in order to combat transmission errors. The LSF vector quantizer classifies unquantized line spectral frequencies into four categories, employing different vector quantization tables for each category. Each quantization table is optimized for particular types of vectors. For each category, three split vector codebooks are used with a simplified error measure to find three candidate split quantized vectors. The three sets of three split vectors are combined to produce as many as 27 vectors from each category. The quantizer then makes a final selection of optimal category using a more complex error measure to achieve the robust performance across speakers and handsets. Split vector quantization follows a two stage constrained search procedure that results in an ordered set of quantized line spectral frequencies that is "close" to the unquantized set with moderate complexity within each category. Effective and simple transmission error detection schemes at the receiver are made possible by the split nature of the vector quantization and the constrained search procedure. Only twenty-six bits are required to encode ten line spectral frequencies.

    Abstract translation: 为码激励线性预测(CELP)语音编码器提供了一种在数字蜂窝网络(DCN)中具有特殊应用的线谱频率(LSF)矢量量化器。 LSF矢量量化器在所采用的比特方面是有效的,在扬声器和手机上的性能方面是稳健和有效的,在复杂性方面适中,并且适应有效和简单的内置传输错误检测方案。 LSF矢量量化器采用最小数量的位,具有中等复杂度,并且具有内置的错误检测能力,以抵御传输错误。 LSF矢量量化器将未量化的线谱频率分为四类,每类使用不同的矢量量化表。 每个量化表针对特定类型的向量进行了优化。 对于每个类别,使用三个分离矢量码本与简化误差测量来找到三个候选分割量化矢量。 将三组三个分割矢量组合起来,从每个类别产生多达27个矢量。 然后量化器使用更复杂的误差测量来最终选择最佳类别,以实现扬声器和手机的强大性能。 分割矢量量化遵循两阶段约束搜索过程,其导致在每个类别内具有中等复杂度的未量化集合的有序集合的量化线谱频率。 通过矢量量化和约束搜索过程的分割性质,可以实现接收机的有效和简单的传输错误检测方案。 只需要二十六位来编码十个线谱频率。

    Comfort noise generation for digital communication systems
    8.
    发明授权
    Comfort noise generation for digital communication systems 失效
    数字通信系统的舒适噪声生成

    公开(公告)号:US5630016A

    公开(公告)日:1997-05-13

    申请号:US614777

    申请日:1996-03-07

    CPC classification number: G10L19/012

    Abstract: A digital discontinuous cellular communication system has a transmitter that transmits two frames of data following detection of voice inactivity. A receiver includes a comfort noise generator that uses the two frames of data to output noise to the speaker during period of voice inactivity. The comfort noise generator includes synthesis codebook with samples scaled by actual background noise and excitation codebook with samples filtered and scaled by the background noise that are combined to produce comfort noise having attributes and loudness level of the received background noise prior to interruption of transmission. The scaled signals are weighted to vary the loudness level and spectral attributes.

    Abstract translation: 数字不连续的蜂窝通信系统具有在检测到语音不活动之后发送两帧数据的发射机。 接收机包括舒适噪声发生器,其在语音不活动期间使用两帧数据来向扬声器输出噪声。 舒适噪声发生器包括具有通过实际背景噪声缩放的样本的合成码本和激励码本,其中滤波和被背景噪声缩放的样本被组合以产生在传输中断之前具有接收到的背景噪声的属性和响度水平的舒适噪声。 加权比例的信号以改变响度级别和频谱属性。

    Comfort noise generation for digital communication systems
    9.
    发明授权
    Comfort noise generation for digital communication systems 失效
    数字通信系统的舒适噪声生成

    公开(公告)号:US5537509A

    公开(公告)日:1996-07-16

    申请号:US890747

    申请日:1992-05-28

    Abstract: A digital discontinuous cellular communication system has a transmitter that transmits two frames of data following detection of voice inactivity. A receiver includes a comfort noise generator that uses the two frames of data to output noise to the speaker during period of voice inactivity. The comfort noise generator includes synthesis codebook with samples scaled by actual background noise and excitation codebook with samples filtered and scaled by the background noise that are combined to produce comfort noise having attributes and loudness level of the received background noise prior to interruption of transmission. The scaled signals are weighted to vary the loudness level and spectral attributes.

    Abstract translation: 数字不连续的蜂窝通信系统具有在检测到语音不活动之后发送两帧数据的发射机。 接收机包括舒适噪声发生器,其在语音不活动期间使用两帧数据来向扬声器输出噪声。 舒适噪声发生器包括具有通过实际背景噪声缩放的样本的合成码本和激励码本,其中滤波和被背景噪声缩放的样本被组合以产生在传输中断之前具有接收到的背景噪声的属性和响度水平的舒适噪声。 加权比例的信号以改变响度级别和频谱属性。

Patent Agency Ranking