Matrix quantization with vector quantization error compensation for
robust speech recognition
    11.
    发明授权
    Matrix quantization with vector quantization error compensation for robust speech recognition 失效
    用于鲁棒语音识别的矢量量化误差补偿的矩阵量化

    公开(公告)号:US6070136A

    公开(公告)日:2000-05-30

    申请号:US957902

    申请日:1997-10-27

    CPC classification number: G10L15/32 G10L15/142

    Abstract: A speech recognition system utilizes both matrix and vector quantizers as front ends to a second stage speech classifier. Matrix quantization exploits input signal information in both frequency and time domains, and the vector quantizer primarily operates on frequency domain information. However, in some circumstances, time domain information may be substantially limited which may introduce error into the matrix quantization. Information derived from vector quantization may be utilized by a hybrid decision generator to error compensate information derived from matrix quantization. Additionally, fuzz methods of quantization and robust distance measures may be introduced to also enhance speech recognition accuracy. Furthermore, other speech classification stages may be used, such as hidden Markov models which introduce probabilistic processes to further enhance speech recognition accuracy. Multiple codebooks may also be combined to form single respective codebooks for matrix and vector quantization to lessen the demand on processing resources.

    Abstract translation: 语音识别系统利用矩阵和矢量量化器作为第二级语音分类器的前端。 矩阵量化利用频域和时域中的输入信号信息,矢量量化器主要对频域信息进行操作。 然而,在某些情况下,时域信息可能被大体上限制,这可能将错误引入到矩阵量化中。 从矢量量化得到的信息可以由混合决策发生器利用来对从矩阵量化导出的信息进行误差补偿。 此外,可以引入量化和鲁棒距离度量的模糊方法以增强语音识别精度。 此外,可以使用其他语音分类阶段,例如引入概率过程以进一步增强语音识别精度的隐马尔可夫模型。 多个码本也可以组合以形成用于矩阵和矢量量化的单个相应码本,以减少对处理资源的需求。

    Adaptive speech recognition with selective input data to a speech
classifier
    12.
    发明授权
    Adaptive speech recognition with selective input data to a speech classifier 失效
    具有选择性输入数据到语音分类器的自适应语音识别

    公开(公告)号:US06044343A

    公开(公告)日:2000-03-28

    申请号:US883978

    申请日:1997-06-27

    CPC classification number: G10L15/063 G10L15/20

    Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes .lambda.j and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O.vertline..lambda.j) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.

    Abstract translation: 语音识别系统的一个实施例用语音输入信号预处理和特征提取来组织,随后是以多个信噪比设置有相应码本集合的模糊矩阵量化器(FMQ)。 FMQ量化来自一组词汇单词的各种训练词,并产生观察序列O输出数据以训练隐马尔可夫模型(HMM)过程λj,并为每个词汇词码本生成模糊距离测量输出数据。 处理器使用模糊维特比算法来计算每个词汇词的最大似然概率PR(O |λj)。 模糊距离测度和最大似然概率以各种方式混合,以优化语音识别精度和语音识别速度性能。

    Distance measure in a speech recognition system for speech recognition
using frequency shifting factors to compensate for input signal
frequency shifts
    13.
    发明授权
    Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts 失效
    用于语音识别系统中的距离测量,使用频移因子来补偿输入信号频移

    公开(公告)号:US6032116A

    公开(公告)日:2000-02-29

    申请号:US883980

    申请日:1997-06-27

    CPC classification number: G10L15/20 G10L15/02 G10L15/10

    Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented by a vector .function. of line spectral pair frequencies and are fuzzy matrix quantized to respective a vector .function. entries in a codebook of the FMQ. A distance measure between .function. and .function., d(.function.,.function.), is defined as ##EQU1## where the constants .alpha..sub.1, a.sub.2, .beta..sub.1 and .beta..sub.2 are set to substantially minimize quantization error, and e.sub.i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal. The speech recognition system may also include hidden Markov models and neural networks, such as a multilevel perceptron neural network, speech classifiers.

    Abstract translation: 用语音输入信号预处理和特征提取后跟模糊矩阵量化器(FMQ)来组织语音识别系统的一个实施例。 语音输入信号的帧由线谱对频率的向量f表示,并且是模糊矩阵量化到FMQ的码本中的矢量+ E,cir f + EE条目。 定义f和+ E之间的距离度量,cir f + EE,d(f,+ E,cir f + EE),其中常数α1,α2,β1和β2被设置为基本上最小化量化误差 ,ei是语音输入信号的误差功率谱和语音输入信号的第i线频谱对频率处的预测语音输入信号。 语音识别系统还可以包括隐马尔可夫模型和神经网络,例如多层感知器神经网络,语音分类器。

    Line spectral frequencies and energy features in a robust signal
recognition system
    14.
    发明授权
    Line spectral frequencies and energy features in a robust signal recognition system 失效
    鲁棒信号识别系统中的线谱频率和能量特征

    公开(公告)号:US6009391A

    公开(公告)日:1999-12-28

    申请号:US907145

    申请日:1997-08-06

    CPC classification number: G10L15/20 G10L15/02 G10L15/10

    Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented in a matrix by a vectorf of line spectral pair frequencies and energy coefficients and are fuzzy matrix quantized to respective vector f entries of a matrix codeword in a codebook of the FMQ. The energy coefficients include the original energy and the first and second derivatives of the original energy which increase recognition accuracy by, for example, being generally distinctive speech input signal parameters and providing noise signal suppression especially when the noise signal has a relatively constant energy over at least two time frame intervals. To reduce data while maintaining sufficient resolution, the energy coefficients may be normalized and logarithmically represented. A distance measure between f and f, d(f, f), is defined as ##EQU1## where the constants .alpha..sub.1, .alpha..sub.2, .beta..sub.1 and .beta..sub.2 are set to substantially minimize quantization error, e.sub.i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal, the first G LSP frequencies are most likely to be frequency shifted by noise, and the last P+3 coefficients represent the three energy coefficients. This robust distance measure can be used to enhance speech recognition performance in generally any speech recognition system using line spectral pair based distance measures.

    Abstract translation: 用语音输入信号预处理和特征提取后跟模糊矩阵量化器(FMQ)来组织语音识别系统的一个实施例。 语音输入信号的帧通过线谱对频率和能量系数的矢量以矩阵表示,并且是模糊矩阵量化到FMQ的码本中的矩阵码字的相应向量+ E,cir f + EE条目。 能量系数包括原始能量和原始能量的第一和第二导数,其通过例如通常是有区别的语音输入信号参数来提高识别精度,并且提供噪声信号抑制,特别是当噪声信号具有相对恒定的能量时 至少两个时间间隔。 为了在保持足够的分辨率的同时减少数据,可以对能量系数进行归一化和对数表示。 定义f和+ E之间的距离度量,cir f + EE,d(f,+ E,cir f + EE),其中常数α1,α2,β1和β2被设置为基本上最小化量化 误差,ei是语音输入信号的误差功率谱和语音输入信号的第i线频谱对频率处的预测语音输入信号,第一G LSP频率最有可能被噪声频移,最后 P + 3系数表示三个能量系数。 这种可靠的距离测量可以用于在通常使用基于线光谱对的距离测量的任何语音识别系统中增强语音识别性能。

Patent Agency Ranking