Apparatus and method of grouping utterances of a phoneme into
context-dependent categories based on sound-similarity for automatic
speech recognition
    1.
    发明授权
    Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition 失效
    基于自动语音识别的声音相似性将音素的语音分组成上下文相关类别的装置和方法

    公开(公告)号:US5195167A

    公开(公告)日:1993-03-16

    申请号:US871600

    申请日:1992-04-17

    CPC分类号: G10L15/063

    摘要: Symbol feature values and contextual feature values of each event in a training set of events are measured. At least two pairs of complementary subsets of observed events are selected. In each pair of complementary subsets of observed events, one subset has contextual features with values in a set C.sub.n, and the other set has contextual features with values in a set C.sub.n, were the sets in C.sub.n and C.sub.n are complementary sets of contextual feature values. For each subset of observed events, the similarity values of the symbol features of the observed events in the subsets are calculated. For each pair of complementary sets of observed events, a "goodness of fit" is the sum of the symbol feature value similarity of the subsets. The sets of contextual feature values associated with the subsets of observed events having the best "goodness of fit" are identified and form context-dependent bases for grouping the observed events into two output sets.

    摘要翻译: 测量训练集中的每个事件的符号特征值和上下文特征值。 选择观察事件的至少两对互补子集。 在观察事件的每对互补子集中,一个子集具有集合C n中的值的上下文特征,另一个集合具有集合Cn中的值的上下文特征,Cn和Cn中的集合是上下文特征值的互补集合 。 对于观察事件的每个子集,计算子集中观察事件的符号特征的相似度值。 对于每对观察事件的互补集合,“拟合优度”是子集的符号特征值相似度的总和。 识别与具有最佳“拟合优度”的观察事件的子集相关联的上下文特征值集合,并形成用于将观察到的事件分组为两个输出集合的上下文相关基础。

    Speech recognizer having a speech coder for an acoustic match based on
context-dependent speech-transition acoustic models
    2.
    发明授权
    Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models 失效
    语音识别器具有基于上下文相关语音 - 过渡声学模型的用于声学匹配的语音编码器

    公开(公告)号:US5333236A

    公开(公告)日:1994-07-26

    申请号:US942862

    申请日:1992-09-10

    CPC分类号: G10L19/06

    摘要: A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.

    摘要翻译: 语音编码装置将发声特征矢量信号的特征值与原型矢量信号的参数值的接近度进行比较,以获得特征向量信号和每个原型矢量信号的原型匹配分数。 语音编码装置存储表示语音转换的多个语音转换模型。 至少一个语音转换由多个不同的模型表示。 每个语音转换模型具有多个模型输出,每个模型输出包括原型矢量信号的原型匹配分数。 每个模型输出具有输出概率。 用于第一特征向量信号和每个语音转换模型的模型匹配分数包括用于第一特征向量信号和原型矢量信号的至少一个原型匹配分数的输出概率。 用于第一特征向量信号和每个语音转换的语音转换匹配分数包括用于第一特征向量信号的最佳模型匹配分数和表示语音转换的所有语音转换模型。 输出第一特征矢量信号和每个语音转换的每个语音转换的识别值和语音转换匹配分数作为第一特征向量信号的编码话音表示信号。

    Speech coding apparatus having speaker dependent prototypes generated
from nonuser reference data
    3.
    发明授权
    Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data 失效
    具有由非用户参考数据生成的具有说话者依赖原型的语音编码装置

    公开(公告)号:US5278942A

    公开(公告)日:1994-01-11

    申请号:US802678

    申请日:1991-12-05

    CPC分类号: G10L15/063 G10L15/02

    摘要: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

    摘要翻译: 一种用于语音识别装置和方法的语音编码装置和方法。 在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值,以产生表示特征值的一系列特征向量信号。 存储多个具有至少一个参数值和唯一识别值的原型矢量信号。 将特征矢量信号的接近度与原型矢量信号的参数值进行比较,以获得特征值信号和每个原型矢量信号的原型匹配分数。 输出具有最佳原型匹配分数的原型矢量信号的识别值作为特征矢量信号的编码表示信号。 从合成的训练矢量信号和测量的训练矢量信号产生与扬声器相关的原型矢量信号。 合成的训练矢量信号是变换的参考特征矢量信号,其代表参考的一组扬声器中的一个或多个扬声器的一个或多个话音的特征值。 测量的训练特征向量信号表示不在参考集合中的新的说话者/用户的一个或多个话语的特征值。

    Context-dependent speech recognizer using estimated next word context
    4.
    发明授权
    Context-dependent speech recognizer using estimated next word context 失效
    使用估计下一个单词上下文的上下文相关语音识别器

    公开(公告)号:US5233681A

    公开(公告)日:1993-08-03

    申请号:US874271

    申请日:1992-04-24

    IPC分类号: G10L15/10 G10L15/18 G10L15/28

    CPC分类号: G10L15/19 G10L15/193

    摘要: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.

    Speech coding apparatus having acoustic prototype vectors generated by
tying to elementary models and clustering around reference vectors
    6.
    发明授权
    Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors 失效
    语音编码装置具有通过绑定到基本模型并围绕参考矢量聚类而生成的声学原型矢量

    公开(公告)号:US5497447A

    公开(公告)日:1996-03-05

    申请号:US28028

    申请日:1993-03-08

    CPC分类号: G10L15/063

    摘要: A speech coding apparatus in which measured acoustic feature vectors are each represented by the best matched prototype vector. The prototype vectors are generated by storing a model of a training script comprising a series of elementary models. The value of at least one feature of a training utterance of the training script is measured over each of a series of successive time intervals to produce a series of training feature vectors. A first set of training feature vectors corresponding to a first elementary model in the training script is identified. The feature value of each training feature vector signal in the first set is compared to the parameter value of a first reference vector signal to obtain a first closeness score, and is compared to the parameter value of a second reference vector to obtain a second closeness score for each training feature vector. For each training feature vector in the first set, the first closeness score is compared with the second closeness score to obtain a reference match score. A first subset contains those training feature vectors in the first set having reference match scores better than a threshold Q, and a second subset contains those having reference match scores less than the threshold Q. One or more partition values are generated for a first prototype vector frown the first subset of training feature vectors, and one or more additional partition values are generated for the first prototype vector from the second subset of training feature vectors.

    摘要翻译: 一种语音编码装置,其中测量的声学特征矢量各自由最佳匹配的原型矢量表示。 通过存储包括一系列基本模型的训练脚本的模型来生成原型向量。 在一系列连续时间间隔中的每一个上测量训练脚本的训练话语的至少一个特征的值,以产生一系列训练特征向量。 识别与训练脚本中的第一个基本模型对应的第一组训练特征向量。 将第一组中的每个训练特征向量信号的特征值与第一参考矢量信号的参数值进行比较以获得第一接近度分数,并将其与第二参考矢量的参数值进行比较以获得第二接近度分数 对于每个训练特征向量。 对于第一组中的每个训练特征向量,将第一接近度得分与第二接近度得分进行比较以获得参考匹配得分。 第一子集包含具有比阈值Q更好的参考匹配分数的第一集合中的那些训练特征向量,并且第二子集包含具有小于阈值Q的参考匹配分数的训练特征向量。对于第一原型矢量生成一个或多个分区值 使训练特征向量的第一子集皱眉,并且从训练特征向量的第二子集为第一原型向量生成一个或多个附加分区值。

    Speaker-independent label coding apparatus
    7.
    发明授权
    Speaker-independent label coding apparatus 失效
    扬声器独立标签编码设备

    公开(公告)号:US5182773A

    公开(公告)日:1993-01-26

    申请号:US673810

    申请日:1991-03-22

    CPC分类号: H03M7/3082 G10L19/038

    摘要: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.

    Labelling speech using context-dependent acoustic prototypes
    8.
    发明授权
    Labelling speech using context-dependent acoustic prototypes 失效
    使用上下文相关的声学原型标注语音

    公开(公告)号:US5455889A

    公开(公告)日:1995-10-03

    申请号:US14966

    申请日:1993-02-08

    CPC分类号: G10L15/142

    摘要: The present invention relates to labelling of speech in a context-dependent speech recognition system. When labelling speech using context-dependent prototypes the phone context of a frame of speech needs to be aligned with the appropriate acoustic parameter vector. Since aligning a large amount of data is difficult if based upon arc ranks, the present invention aligns the data using context-independent acoustic prototypes. The phonetic context of each phone of the data is known. Therefore after the alignment step the acoustic parameter vectors are tagged with a corresponding phonetic context. Context-dependent prototype vectors exists for each label. For all labels the context-dependent prototype vectors having the same phonetic context as the tagged acoustic parameter vector are determined. For each label the probability of achieving the tagged acoustic parameter vector is determined given each of the context-dependent label prototype vectors having the same phonetic context as the tagged acoustic parameter vector. The label with the highest probability is associated with the context-dependent acoustic parameter vector.

    摘要翻译: 本发明涉及在上下文相关语音识别系统中对语音的标注。 当使用上下文相关原型标注语音时,语音帧的电话上下文需要与适当的声学参数向量对准。 由于如果基于弧级排列大量的数据是很困难的,本发明使用与上下文无关的声学原型进行对准数据。 数据的每个电话的语音语境是已知的。 因此,在对准步骤之后,声学参数矢量用相应的语音上下文标记。 每个标签都存在与上下文相关的原型向量。 对于所有标签,确定与标记的声学参数矢量具有相同语音上下文的上下文相关原型矢量。 对于每个标签,确定具有与标记的声学参数矢量相同的语音上下文的上下文相关标签原型矢量中的每个标签声学参数矢量的概率。 具有最高概率的标签与上下文相关的声学参数矢量相关联。

    Feneme-based Markov models for words
    9.
    发明授权
    Feneme-based Markov models for words 失效
    基于Feneme的马尔可夫模型的词

    公开(公告)号:US5165007A

    公开(公告)日:1992-11-17

    申请号:US366231

    申请日:1989-06-12

    IPC分类号: G10L15/02 G10L15/06 G10L15/14

    CPC分类号: G10L15/142 G10L2015/0631

    摘要: In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability of the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.

    摘要翻译: 在一种语音识别系统中,公开了用基于标签的马尔可夫模型对词进行建模的装置和方法。 所述建模包括:将对应于词汇表中的单词的第一语音输入输入到将每个口语单词转换成标准标签序列的声学处理器,其中每个标准标签对应于可分配到时间间隔的声音类型; 将每个标准标签表示为具有多个状态的概率模型,至少一个从状态到状态的转变,以及在某些转换时的至少一个可设置的输出概率; 将选定的声音输入输入到将所选择的声音输入转换成个性化标签的声学处理器,每个个性化标签对应于分配给一段时间的声音类型; 并将每个输出概率设置为由给定模型表示的标准标签的概率,该给定模型在给定模型中的给定转换处产生特定个性化标签。 本发明解决了在语音识别系统中简单和自动地生成单词模型的问题。

    Speech coding apparatus with single-dimension acoustic prototypes for a
speech recognizer
    10.
    发明授权
    Speech coding apparatus with single-dimension acoustic prototypes for a speech recognizer 失效
    具有用于语音识别器的单维声学原型的语音编码装置

    公开(公告)号:US5280562A

    公开(公告)日:1994-01-18

    申请号:US770495

    申请日:1991-10-03

    CPC分类号: G10L19/038 H03M7/3082

    摘要: In speech recognition and speech coding, the values of at least two features of an utterance are measured during a series of time intervals to produce a series of feature vector signals. A plurality of single-dimension prototype vector signals having only one parameter value are stored. At least two single-dimension prototype vector signals having parameter values representing first feature values, and at least two other single-dimension prototype vector signals have parameter values representing second feature values. A plurality of compound-dimension prototype vector signals have unique identification values and comprise one first-dimension and one second-dimension prototype vector signal. At least two compound-dimension prototype vector signals comprise the same first-dimension prototype vector signal. The feature values of each feature vector signal are compared to the parameter values of the compound-dimension prototype vector signals to obtain prototype match scores. The identification values of the compound-dimension prototype vector signals having the best prototype match scores for the feature vectors signals are output as a sequence of coded representations of an utterance to be recognized. A match score, comprising an estimate of the closeness of a match between a speech unit and the sequence of coded representations of the utterance, is generated for each of a plurality of speech units. At least one speech subunit, of one or more best candidate speech units having the best match scores, is displayed.

    摘要翻译: 在语音识别和语音编码中,在一系列时间间隔期间测量话音的至少两个特征的值,以产生一系列特征向量信号。 存储仅具有一个参数值的多个单维原型矢量信号。 具有表示第一特征值的参数值和至少两个其它单维原型矢量信号的至少两个单维原型矢量信号具有表示第二特征值的参数值。 多个复合尺寸原型矢量信号具有唯一的识别值,并且包括一个第一维和一个第二维原型矢量信号。 至少两个复合维度原型矢量信号包括相同的第一维原型矢量信号。 将每个特征向量信号的特征值与化合物维度原型矢量信号的参数值进行比较,以获得原型匹配分数。 具有特征矢量信号的具有最佳原型匹配分数的复合维度原型矢量信号的识别值被输出为将被识别的话语的编码表示的序列。 针对多个语音单元中的每一个生成包括语音单元与语音编码表示序列之间的匹配的接近度的估计的匹配分数。 显示具有最佳匹配分数的一个或多个最佳候选语音单元的至少一个语音子单元。