Instability eradication for analysis-by-synthesis speech codecs
    1.
    发明授权
    Instability eradication for analysis-by-synthesis speech codecs 有权
    针对综合语音编解码器的不稳定性消除

    公开(公告)号:US5987406A

    公开(公告)日:1999-11-16

    申请号:US232274

    申请日:1999-01-15

    CPC分类号: G10L19/005 G10L19/06

    摘要: Instability inherent in analysis-by-synthesis speech/audio codecs and caused in particular by channel errors during transmission of highly periodic signals such as high-frequency sine waves is removed. Analysis-by-synthesis techniques involve production, in response to the speech/audio signal and at regular time intervals called frames, of (a) a set of spectral parameters for use in driving a synthesis filter in view of synthesizing the speech/audio signal, and (b) a pitch gain for constructing a past-excitation-signal component supplied to the synthesis filter. In accordance with the instability eradication method, the first step consists of detecting a set of conditions including (i) a resonance condition assessed from the spectral parameters, (ii) a duration condition detected when the resonance condition has prevailed for at least the M most recent frames, M being an integer greater than 1, and (iii) a gain condition which evidences consistently-high values of the pitch gain in the N most recent frames, N being an integer greater than 1. To eradicate the occasional instability, the pitch gain is reduced to a value lower than a given threshold whenever these three conditions are detected.

    摘要翻译: 消除了在综合语音/音频编解码器分析中固有的不稳定性,特别是在高周期性信号(如高频正弦波)传输期间由信道错误引起的不稳定性。 分析合成技术涉及响应于语音/音频信号和在称为帧的规则时间间隔的生产(a)鉴于合成语音/音频信号而用于驱动合成滤波器的一组频谱参数 ,和(b)用于构成提供给合成滤波器的过去激励信号分量的音调增益。 根据不稳定性根除方法,第一步包括检测一组条件,包括(i)从光谱参数评估的共振条件,(ii)当共振条件至少满足最多M个时检测到的持续时间条件 最近的帧,M是大于1的整数,以及(iii)增益条件,其​​表示N个最近帧中的音调增益的一致高值,N是大于1的整数。为了消除偶尔的不稳定性, 每当检测到这三个条件时,音调增益减小到低于给定阈值的值。

    Low bit-rate speech coding system and method using voicing probability
determination
    2.
    发明授权
    Low bit-rate speech coding system and method using voicing probability determination 失效
    低比特率语音编码系统和使用语音概率确定的方法

    公开(公告)号:US5890108A

    公开(公告)日:1999-03-30

    申请号:US726336

    申请日:1996-10-03

    申请人: Suat Yeldener

    发明人: Suat Yeldener

    摘要: A modular system and method is provided for low bit rate encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes a model signal and subtracts the model signal from the original signal in the segment to obtain a residual excitation signal. Using the excitation signal the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the excitation signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced and the unvoiced portions of the excitation spectrum, as determined by the parameter Pv, are encoded using one or more parameters related to the energy of the excitation signal in a predetermined set of frequency bands. In the decoder, speech is synthesized from the transmitted parameters representing the model speech, the signal pitch, voicing probability and excitation levels in a reverse order. Boundary conditions between voiced and unvoiced segments are established to ensure amplitude and phase continuity for improved output speech quality. Perceptually smooth transition between frames is ensured by using an overlap and add method of synthesis. LPC interpolation and post-filtering is used to obtain output speech with improved perceptual quality.

    摘要翻译: 提供了一种使用语音概率确定对语音信号进行低比特率编码和解码的模块化系统和方法。 连续输入语音被划分成预定长度的时间段。 对于每个段,系统的编码器计算模型信号,并从段中的原始信号中减去模型信号以获得残余激励信号。 使用激励信号,系统计算信号音调和与激励信号的频谱中的有声和无声部分的相对内容相关的参数,其被表示为定义为发音​​概率的比率Pv。 由参数Pv确定的激发光谱的有声和无声部分使用与预定频带组中​​的激励信号的能量相关的一个或多个参数进行编码。 在解码器中,语音由表示模型语音的传输参数,信号音调,发音概率和激励电平以相反的顺序合成。 建立有声和无声段之间的边界条件,以确保振幅和相位连续性,以改善输出语音质量。 通过使用重叠和添加合成方法来确保帧之间的感知平滑过渡。 LPC内插和后置滤波用于获得具有改善的感知质量的输出语音。

    Optimized pulse location in codebook searching techniques for speech
processing
    3.
    发明授权
    Optimized pulse location in codebook searching techniques for speech processing 失效
    用于语音处理的码本搜索技术中的优化脉冲位置

    公开(公告)号:US5822724A

    公开(公告)日:1998-10-13

    申请号:US518354

    申请日:1995-06-14

    申请人: Dror Nahumi

    发明人: Dror Nahumi

    CPC分类号: G10L19/10

    摘要: Simplified methods of searching a codebook table are provided. These methods perform a codebook search for a plurality of pulses, one pulse at a time, in order of increasing to decreasing pulse significance, wherein pulse significance is defined as the relative contribution a given pulse provides to minimizing the mean-squared error between the source signal and the quantized sequence of pulses.

    摘要翻译: 提供了搜索码本表的简化方法。 这些方法按照增加到减小的脉冲显着性的顺序,对多个脉冲,一次脉冲进行码本搜索,其中脉冲显着性定义为给定脉冲提供的最小化源的均方误差的相对贡献 信号和脉冲的量化序列。

    Method and apparatus for speech excitation waveform coding using
multiple error waveforms
    4.
    发明授权
    Method and apparatus for speech excitation waveform coding using multiple error waveforms 失效
    用于使用多个误差波形的语音激励波形编码的方法和装置

    公开(公告)号:US5809459A

    公开(公告)日:1998-09-15

    申请号:US651172

    申请日:1996-05-21

    IPC分类号: G10L19/12 G10L9/00 G10L9/14

    CPC分类号: G10L19/125 G10L19/24

    摘要: A method and apparatus (100) for pitch-epoch-synchronous source-filter speech encoding by means of error component modeling methods (310) which capture fundamental orthogonal (uncorrelated) basis elements of an excitation source waveform. A periodic waveform model (318) along with four orthogonal error waveforms, desirably including phase error (319), ensemble error (321), standard deviation error (323), and mean error (324) waveforms, are incorporated together to form a complete description of the excitation. These error waveforms (319,321, 323, 324) represent those portions of the excitation that are not represented by the purely periodic model. By thus orthogonalizing the error components, the perceptual effect of each element is isolated from the composite set, and can thus be encoded separately. In addition to high-quality, fixed-rate operation, the identity-system capability and low complexity of the speech encoding method and apparatus make them applicable to variable-rate applications without changing underlying modeling methods.

    摘要翻译: 一种利用捕获激励源波形的基本正交(不相关)基元的误差分量建模方法(310)来进行音调同步源滤波语音编码的方法和装置(100)。 周期性波形模型(318)连同四个正交误差波形,包括相位误差(319),综合误差(321),标准偏差误差(323)和平均误差(324)波形,并入一起形成一个完整的 激励的描述。 这些误差波形(319,321,323,324)表示不由纯周期模型表示的激发部分。 通过这样正交化误差分量,每个元素的感知效果从复合集合中分离出来,因此可以分开编码。 除了高质量,固定速率的操作之外,语音编码方法和设备的身份识别系统能力和低复杂度使得它们适用于可变速率应用而不改变基础建模方法。

    Frame-count-dependent smoothing filter for reducing abrupt decoder
background noise variation during speech pauses in VOX
    5.
    发明授权
    Frame-count-dependent smoothing filter for reducing abrupt decoder background noise variation during speech pauses in VOX 失效
    帧数依赖平滑滤波器,用于在VOX中的语音暂停期间减少突发的解码器背景噪声变化

    公开(公告)号:US5787388A

    公开(公告)日:1998-07-28

    申请号:US666124

    申请日:1996-06-21

    申请人: Toshihiro Hayata

    发明人: Toshihiro Hayata

    CPC分类号: G10L19/012

    摘要: In a speech decoding apparatus, a conversion unit converts a received encoded signal into a parameter in units of frames. A memory repeatedly updates and stores the parameter representing a pause state and output from the conversion unit for the pause interval of the speech signal. A synthesis filter coefficient generation unit generates a synthesis filter coefficient on the basis of the parameter read out from the memory. A smoothed filter coefficient generation unit generates a smoothed filter coefficient on the basis of the synthesis filter coefficient output from the synthesis filter coefficient generation unit. The smoothed filter coefficient generation unit generates the smoothed filter coefficient which is smoothed such that the synthesis filter coefficient changes in accordance with a count value of the frames during the predetermined period. A background noise generation unit generates background noise on the basis of the parameter read out from the memory for the pause interval of the speech signal. A smoothing filter performs filtering processing of the background noise output from the background noise generation unit by using the smoothed filter coefficient output from the smoothed filter coefficient unit and outputs smoothed background noise.

    摘要翻译: 在语音解码装置中,转换单元以帧为单位将接收的编码信号转换为参数。 存储器重复地更新并存储表示暂停状态的参数,并且从语音信号的暂停间隔的转换单元输出。 合成滤波器系数生成单元根据从存储器读出的参数生成合成滤波器系数。 平滑滤波器系数生成单元基于从合成滤波器系数生成单元输出的合成滤波器系数,生成平滑滤波器系数。 平滑滤波器系数生成单元生成平滑后的滤波器系数,使得合成滤波器系数根据预定时段内的帧的计数值而变化。 背景噪声生成单元基于从存储器读出的用于语音信号的暂停间隔的参数来生成背景噪声。 平滑滤波器通过使用从平滑滤波器系数单元输出的平滑滤波器系数对背景噪声生成单元输出的背景噪声进行滤波处理,并输出平滑的背景噪声。

    Timing recovery scheme for packet speech in multiplexing environment of
voice with data applications
    6.
    发明授权
    Timing recovery scheme for packet speech in multiplexing environment of voice with data applications 失效
    数据语音复用环境中数据包语音的定时恢复方案

    公开(公告)号:US5699481A

    公开(公告)日:1997-12-16

    申请号:US443651

    申请日:1995-05-18

    摘要: Multiple speech bit-stream frame buffers are used between the controller and the speech decoder. Whenever excessive or missing speech packages are detected, the speech decoder switches to a special corrective mode. If there is too much, the buffered frames are played out fast; if there is too little the buffered frames are played out slowly. For the fast play, some speech information has to be discarded, while for the slow play some speech-like information has to be synthesized. The speech may be handled in sub-frame units, which may be 52 samples at a time. Low energy, silent or unvoiced sub-frames, which also indicate non-periodicity, are detected and manipulated. Moreover, the decoded signal is manipulated at the excitation phase, before the final LPC synthesis filter, resulting in a transparent perceptual effect on the manipulated speech quality. Additionally, the buffers are enlarged such that the problem caused by controller asynchronicity is eliminated. Further, for bulk delay caused by multiplexing data and speech transmissions, the buffers maintain the smallest number of speech packets necessary to prevent buffer underflow during a data packet transmission while minimizing speech delay and preserving data transmission efficiency.

    摘要翻译: 在控制器和语音解码器之间使用多个语音比特流帧缓冲器。 每当检测到过多或丢失的语音包时,语音解码器切换到特殊的校正模式。 如果太多,缓冲的帧将被快速播放; 如果缓存的帧缓存太慢, 对于快速播放,一些语音信息必须被丢弃,而对于慢播,一些语音信息必须被合成。 语音可以以子帧单位处理,一次可以是52个样本。 低能量,无声或无声子帧,也表示非周期性,被检测和操纵。 此外,在最终LPC合成滤波器之前,在激励阶段处理解码信号,导致对被操纵的语音质量的透明感知效应。 此外,缓冲器被放大,从而消除了由控制器异步引起的问题。 此外,对于由复用数据和语音传输引起的批量延迟,缓冲器保持在数据分组传输期间防止缓冲器下溢所需的最小数量的语音分组,同时最小化语音延迟并保持数据传输效率。

    Digital sampling instrument
    7.
    发明授权
    Digital sampling instrument 失效
    数字采样仪

    公开(公告)号:US5698807A

    公开(公告)日:1997-12-16

    申请号:US611014

    申请日:1996-03-05

    摘要: An electronic music system which imitates acoustic instruments addresses the problem wherein the audio spectrum of a a recorded note is entirely shifted in pitch by transposition. The consequence of this is that unnatural formant shifts occur, resulting in the phenomenon known in the industry as "munchkinization." The present invention eliminates munchkinization, thus allowing a substantially wider transposition range for a single recording. Also, the present invention allows even shorter recordings to be used for still further memory improvements. An analysis stage separates and stores the formant and excitation components of sounds from an instrument. On playback, either the formant component or the excitation component may be manipulated.

    摘要翻译: 模拟声学仪器的电子音乐系统解决了一个问题,其中所记录的音符的音频频谱通过转置完全偏移。 这样做的结果是发生了不自然的共振变化,导致了业界已知的“混沌”现象。 本发明消除了复合,从而允许单个记录的基本更宽的转置范围。 此外,本发明允许将更短的记录用于进一步的存储器改进。 分析阶段分离和存储来自仪器的声音的共振峰和激发分量。 在播放时,可以操纵共振峰分量或激励分量。

    Speech parameter encoding device which includes a dividing circuit for
dividing a frame signal of an input speech signal into subframe signals
and for outputting a low rate output code signal
    8.
    发明授权
    Speech parameter encoding device which includes a dividing circuit for dividing a frame signal of an input speech signal into subframe signals and for outputting a low rate output code signal 失效
    语音参数编码装置,包括:分割电路,用于将输入语音信号的帧信号划分为子帧信号,并输出低速率输出码信号

    公开(公告)号:US5625744A

    公开(公告)日:1997-04-29

    申请号:US193596

    申请日:1994-02-09

    申请人: Kazunori Ozawa

    发明人: Kazunori Ozawa

    CPC分类号: G10L19/07

    摘要: On encoding with a smallest possible number of bits LPC parameters produced by an LPC analyzer from at least one of subframe signals of each frame signal of an input speech signal, a divider divides the LPC parameters into several parameter regions. Using vector code books loaded for each parameter region with code vectors, a vector quantizer quantizes the LPC parameters into, for use as quantized codes, indexes of selected vectors which are selected from the code vectors and of which a linear combination minimizes a quantization distortion.

    摘要翻译: 在由LPC分析器从输入语音信号的每个帧信号的子帧信号中的至少一个产生的LPC参数的最小可能数量的LPC参数的编码中,分频器将LPC参数划分成多个参数区域。 使用用代码向量为每个参数区域加载的矢量代码书,矢量量化器将LPC参数量化为用作量化代码,从代码矢量中选择的选定向量的索引,并且线性组合使量化失真最小化。

    Code-excited linear predictive coding with low delay for speech or audio
signals
    9.
    发明授权
    Code-excited linear predictive coding with low delay for speech or audio signals 失效
    代码激励的线性预测编码,具有低延迟的语音或音频信号

    公开(公告)号:US5339384A

    公开(公告)日:1994-08-16

    申请号:US200805

    申请日:1994-02-22

    申请人: Juin-Hwey Chen

    发明人: Juin-Hwey Chen

    摘要: A code-excited linear-predictive (CELP) coder for speech or audio transmission at compressed (e.g., 16 kb/s) data rates is adapted for low-delay (e.g., less than five ms. per vector) coding by performing spectral analysis of at least a portion of a previous frame of simulated decoded speech to determine a synthesis filter of a much higher order than conventionally used for decoding synthesis and then transmitting only the index for the vector which produces the lowest internal error signal. Modified perceptual weighting parameters and a novel use of postfiltering greatly improve tandeming of a number of encodings and decodings while retaining high quality reproduction.

    摘要翻译: 用于以压缩(例如,16kb / s)数据速率进行语音或音频传输的码激励线性预测(CELP)编码器适用于通过执行频谱分析的低延迟(例如,每个矢量小于5ms)的编码 模拟解码语音的前一帧的至少一部分以确定比传统上用于解码合成高得多的次序的合成滤波器,然后仅传输产生最低内部误差信号的矢量的索引。 改进的感知加权参数和后置滤波的新颖使用极大地改善了多种编码和解码的汇合,同时保持了高质量的再现。

    Apparatus for processing digital audio signal
    10.
    发明授权
    Apparatus for processing digital audio signal 失效
    数字音频信号处理装置

    公开(公告)号:US5303374A

    公开(公告)日:1994-04-12

    申请号:US776213

    申请日:1991-10-15

    CPC分类号: H03M7/3046

    摘要: A digital audio signal processing apparatus is provided having a predictive error generator for generating predictive error data by processing input digital data to acquire a plurality of different frequency characteristics. A selector selects one of the plural predictive error data. A requantizer requantizes the selected predictive error data. A corrector processes with a predetermined frequency characteristic, the requantization error induced during the operation of the requantizer, thereby correcting the requantization error caused in the requantizer. A frequency characteristic control selects at least two of the predictive error data obtained with the plural frequency characteristics, then calculates the selected predictive error data and controls the frequency characteristic in the corrector in accordance with the result of such calculation. In this apparatus, the ratio or the difference between at least two predictive error data obtained with a plurality of frequency characteristics is calculated and then is compared with a predetermined reference value. The frequency characteristic in the corrector is controlled in conformity with the numerical relation between the calculated value and the reference value. Therefore, two or more frequency characteristics in the corrector are selectively rendered conformable with one frequency characteristic in the predictive error generator, hence achieving an enhanced effect of further improving the signal-to-noise ratio.

    摘要翻译: 提供一种具有预测误差发生器的数字音频信号处理装置,用于通过处理输入数字数据来产生预测误差数据以获取多个不同的频率特性。 选择器选择多个预测误差数据之一。 请求者重新选择所选择的预测误差数据。 校正器以预定的频率特性进行处理,即在重新量化器的操作期间引起的再量化误差,从而校正在再量化器中引起的再量化误差。 频率特性控制选择用多个频率特性获得的预测误差数据中的至少两个,然后根据这种计算的结果计算所选择的预测误差数据并控制校正器中的频率特性。 在该装置中,计算由多个频率特性获得的至少两个预测误差数据之间的比率或差值,然后将其与预定参考值进行比较。 校正器中的频率特性根据计算值与参考值之间的数值关系进行控制。 因此,校正器中的两个或更多个频率特性被选择性地与预测误差发生器中的一个频率特性一致,从而实现进一步提高信噪比的增强效果。