Low-complexity encoding/decoding of quantized MDCT spectrum in scalable speech and audio codecs
    23.
    发明申请
    Low-complexity encoding/decoding of quantized MDCT spectrum in scalable speech and audio codecs 有权
    可扩展语音和音频编解码器中量化MDCT频谱的低复杂度编码/解码

    公开(公告)号:US20090234644A1

    公开(公告)日:2009-09-17

    申请号:US12255604

    申请日:2008-10-21

    IPC分类号: G10L19/02 G10L19/00

    CPC分类号: G10L19/24 G10L19/038

    摘要: A scalable speech and audio codec is provided that implements combinatorial spectrum encoding. A residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal is transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum having a plurality of spectral lines. The transform spectrum spectral lines are transformed using a combinatorial position coding technique.The combinatorial position coding technique includes generating a lexicographical index for a selected subset of spectral lines, where each lexicographic index represents one of a plurality of possible binary strings representing the positions of the selected subset of spectral lines. The lexicographical index represents non-zero spectral lines in a binary string in fewer bits than the length of the binary string.

    摘要翻译: 提供可实现组合频谱编码的可扩展语音和音频编解码器。 从基于码激励线性预测(CELP)的编码层获得残留信号,其中残留信号是原始音频信号和原始音频信号的重建版本之间的差异。 残差信号在离散余弦变换(DCT)型变换层处变换,以获得具有多个谱线的对应变换频谱。 使用组合位置编码技术对变换频谱谱线进行变换。 组合位置编码技术包括为选定的谱线子集生成词典索引,其中每个词典索引表示表示所选择的谱线子集的位置的多个可能的二进制串中的一个。 字典索引表示二进制串中的非零谱线,比二进制串的长度少。

    Method and apparatus for high performance low bit-rate coding of unvoiced speech

    公开(公告)号:US20050143980A1

    公开(公告)日:2005-06-30

    申请号:US11066356

    申请日:2005-02-24

    申请人: Pengjun Huang

    发明人: Pengjun Huang

    摘要: A low-bit-rate coding technique for unvoiced segments of speech, without loss of quality compared to the conventional Code Excited Linear Prediction (CELP) method operating at a much higher bit rate. A set of gains are derived from a residual signal after whitening the speech signal by a linear prediction filter. These gains are then quantized and applied to a randomly generated sparse excitation. The excitation is filtered, and its spectral characteristics are analyzed and compared to the spectral characteristics of the original residual signal. Based on this analysis, a filter is chosen to shape the spectral characteristics of the excitation to achieve optimal performance.

    METHOD AND APPARATUS FOR VECTOR QUANTIZATION CODEBOOK SEARCH
    27.
    发明申请
    METHOD AND APPARATUS FOR VECTOR QUANTIZATION CODEBOOK SEARCH 审中-公开
    用于矢量量化的方法和装置代码搜索

    公开(公告)号:US20100174539A1

    公开(公告)日:2010-07-08

    申请号:US12349327

    申请日:2009-01-06

    IPC分类号: G10L19/12

    CPC分类号: G10L19/038

    摘要: A vector quantization codebook search method and apparatus use support vector machines (“SVMs”) to compute a hyperplane, where the hyperplane is used to separate codebook elements into a plurality of bins. During execution, a controller determines which of the plurality of bins contains a desired codebook element, and then searches the determined bin. Codebook search complexity is reduced and an exhaustive codebook search is selectively avoided.

    摘要翻译: 矢量量化码本搜索方法和装置使用支持向量机(“SVM”)来计算超平面,其中超平面用于将码本元素分离成多个箱。 在执行期间,控制器确定多个箱中的哪一个包含期望的码本元素,然后搜索所确定的仓。 减少了码本搜索的复杂度,并选择性地避免了详尽的码本搜索。

    Method and apparatus for robust speech classification
    28.
    发明授权
    Method and apparatus for robust speech classification 有权
    鲁棒语音分类的方法和装置

    公开(公告)号:US07472059B2

    公开(公告)日:2008-12-30

    申请号:US09733740

    申请日:2000-12-08

    申请人: Pengjun Huang

    发明人: Pengjun Huang

    IPC分类号: G10L19/00 G10L11/06

    摘要: A speech classification technique for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques. A speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements. Highly accurate speech classification produces a lower average encoded bit rate, and higher quality decoded speech. The speech classifier considers a maximum number of parameters for each frame of speech, producing numerous and accurate speech mode classifications for each frame. The speech classifier correctly classifies numerous modes of speech under varying environmental conditions. The speech classifier inputs classification parameters from external components, generates internal classification parameters from the input parameters, sets a Normalized Auto-correlation Coefficient Function threshold and selects a parameter analyzer according to the signal environment, and then analyzes the parameters to produce a speech mode classification.

    摘要翻译: 一种语音分类技术,用于对不同语音模式进行鲁棒分类,以实现多模式可变比特率编码技术的最大性能。 语音分类器以最低比特率对用于编码的高百分比的语音段进行精确的分类,满足较低的比特率要求。 高精度的语音分类产生较低的平均编码比特率和更高质量的解码语音。 语音分类器考虑每个语音帧的最大参数数,为每个帧产生大量且准确的语音模式分类。 语音分类器在不同的环境条件下正确分类了许多语音模式。 语音分类器从外部组件输入分类参数,从输入参数生成内部分类参数,设置归一化自相关系数函数阈值,并根据信号环境选择参数分析仪,然后分析参数以产生语音模式分类 。