Continuous reference adaptation in a pattern recognition system
    1.
    发明授权
    Continuous reference adaptation in a pattern recognition system 失效
    模式识别系统中的连续参考适应

    公开(公告)号:US5617486A

    公开(公告)日:1997-04-01

    申请号:US563256

    申请日:1995-11-27

    IPC分类号: G06K9/00 G10L15/06 G10L15/14

    摘要: A pattern recognition system which continuously adapts reference patterns to more effectively recognize input data from a given source. The input data is converted to a set or series of observed vectors and is compared to a set of Markov Models. The closest matching Model is determined and is recognized as being the input data. Reference vectors which are associated with the selected Model are compared to the observed vectors and updated ("adapted") to better represent or match the observed vectors. This updating method retains the value of these observed vectors in a set of accumulation vectors in order to base future adaptations on a broader data set. When updating, the system also may factor in the values corresponding to neighboring reference vectors that are acoustically similar if the data set from the single reference vector is insufficient for an accurate calculation. Every reference vector is updated after every input; thus reference vectors neighboring an updated reference vector may also be updated. The updated reference vectors are then stored by the computer system for use in recognizing subsequent inputs.

    摘要翻译: 一种模式识别系统,其连续地适应参考模式以更有效地识别来自给定源的输入数据。 将输入数据转换为一组或一系列观测向量,并将其与一组马尔科夫模型进行比较。 确定最接近的匹配模型,并将其识别为输入数据。 将与所选模型相关联的参考向量与观察到的向量进行比较并更新(“适应”)以更好地表示或匹配观察到的向量。 这种更新方法将这些观测向量的值保留在一组累积向量中,以便将未来的适应基础放在更广泛的数据集上。 当更新时,如果来自单个参考矢量的数据集不足以进行准确的计算,则系统还可以考虑与相邻参考矢量相对应的值,该参考矢量在声学上类似。 每个参考矢量在每次输入后更新; 因此也可以更新与更新的参考矢量相邻的参考矢量。 然后,更新的参考向量由计算机系统存储以用于识别后续输入。

    Search engine for phrase recognition based on prefix/body/suffix
architecture
    2.
    发明授权
    Search engine for phrase recognition based on prefix/body/suffix architecture 失效
    基于前缀/ body / suffix架构的搜索引擎进行短语识别

    公开(公告)号:US5832428A

    公开(公告)日:1998-11-03

    申请号:US538828

    申请日:1995-10-04

    摘要: A method of constructing a language model for a phrase-based search in a speech recognition system and an apparatus for constructing and/or searching through the language model. The method includes the step of separating a plurality of phrases into a plurality of words in a prefix word, body word, and suffix word structure. Each of the phrases has a body word and optionally a prefix word and a suffix word. The words are grouped into a plurality of prefix word classes, a plurality of body word classes, and a plurality of suffix word classes in accordance with a set of predetermined linguistic rules. Each of the respective prefix, body, and suffix word classes includes a number of prefix words of same category, a number of body words of same category, and a number of suffix words of same category, respectively. The prefix, body, and suffix word classes are then interconnected together according to the predetermined linguistic rules. A method of organizing a phrase search based on the above-described prefix/body/suffix language model is also described. The words in each of the prefix, body, and suffix classes are organized into a lexical tree structure. A phrase start lexical tree structure is then created for the words of all the prefix classes and the body classes having a word which can start one of the plurality of phrases while still maintaining connections of these prefix and body classes within the language model.

    摘要翻译: 一种在语音识别系统中构建用于基于短语的搜索的语言模型的方法以及用于通过语言模型构建和/或搜索的装置。 该方法包括将多个短语分离成前缀字,正文和后缀词结构中的多个单词的步骤。 每个短语都有一个正文词和可选的前缀词和一个后缀词。 这些字根据一组预定语言规则分组成多个前缀词类,多个体词类和多个后缀词类。 各个前缀,正文和后缀词类中的每一个分别包括相同类别的多个前缀词,相同类别的正文字数,以及相同类别的多个后缀词。 然后,前缀,正文和后缀词类根据预定的语言规则互连在一起。 还描述了基于上述前缀/主体/后缀语言模型来组织短语搜索的方法。 每个前缀,正文和后缀类中的单词被组织成词法树结构。 然后,针对所有前缀类和具有单词的主体类创建短语开始词法树结构,该单词可以开始多个短语中的一个,同时仍然保持语言模型内的这些前缀和身体类的连接。

    Method and apparatus for automatically invoking a new word module for
unrecognized user input
    3.
    发明授权
    Method and apparatus for automatically invoking a new word module for unrecognized user input 失效
    用于自动调用新的单词模块以供无法识别的用户输入的方法和装置

    公开(公告)号:US5852801A

    公开(公告)日:1998-12-22

    申请号:US538919

    申请日:1995-10-04

    IPC分类号: G10L15/18 G10L15/22 G01L5/06

    摘要: A method for reducing recognition errors in a speech recognition system that has a user interface, which instructs the user to invoke a new word acquisition module upon a predetermined condition, and that improves the recognition accuracy for poorly recognized words. The user interface of the present invention suggests to a user which unrecognized words may be new words that should be added to the recognition program lexicon. The user interface advises the user to enter words into a new word lexicon that fails to present themselves in an alternative word list for two consecutive tries. A method to improve the recognition accuracy for poorly recognized words via language model adaptation is also provided by the present invention. The present invention increases the unigram probability of an unrecognized word in proportion to the score difference between the unrecognized word and the top one word to guarantee recognition of the same word in a subsequent try. In the event that the score of unrecognized word is unknown (i.e., not in the alternative word list), the present invention increases the unigram probability of the unrecognized word in proportion to the difference between the top one word score and the smallest score in the alternative list.

    摘要翻译: 一种用于减少具有用户界面的语音识别系统中的识别错误的方法,所述用户界面指示用户在预定条件下调用新的单词获取模块,并且提高了对于较差识别字词的识别精度。 本发明的用户界面向用户建议未被识别的单词可以是应被添加到识别程序词典的新单词。 用户界面建议用户将单词输入到一个新的单词词典中,这个单词词典不能在两个连续的尝试中呈现出一个替代单词列表。 通过本发明也提供了通过语言模型适应来提高对于识别不良的词的识别精度的方法。 本发明增加与未被识别的单词和前一个单词之间的分数差成比例的未被识别的单词的单字概率,以保证在随后的尝试中识别相同的单词。 在无法识别的词的得分未知(即,不在替代词表中)的情况下,本发明将不识别词的单词概率与第一个单词得分和最小分数之间的差成比例增加 替代清单

    Continuous mandarin chinese speech recognition system having an
integrated tone classifier
    4.
    发明授权
    Continuous mandarin chinese speech recognition system having an integrated tone classifier 失效
    连续汉语中文语音识别系统具有综合音分类器

    公开(公告)号:US5602960A

    公开(公告)日:1997-02-11

    申请号:US316257

    申请日:1994-09-30

    CPC分类号: G10L15/04 G10L25/15

    摘要: A speech recognition system for continuous Mandarin Chinese speech comprises a microphone, an A/D converter, a syllable recognition system, an integrated tone classifier, and a confidence score augmentor. The syllable recognition system generates N-best theories with initial confidence scores. The integrated tone classifier has a pitch estimator to estimate the pitch of the input once and a long-term tone analyzer to segment the estimated pitch according to the syllables of each of the N-best theories. The long-term tone analyzer performs long-term tonal analysis on the segmented, estimated pitch and generates a long-term tonal confidence signal. The confidence score augmentor receives the initial confidence scores and the long-term tonal confidence signals, modifies each initial confidence score according to the corresponding long-term tonal confidence signal, re-ranks the N-best theories according to the augmented confidence scores, and outputs the N-best theories.

    摘要翻译: 用于连续汉语普通话的语音识别系统包括麦克风,A / D转换器,音节识别系统,集成音分类器和置信分数增强器。 音节识别系统产生具有初始置信分数的N最佳理论。 综合音分类器具有估计输入音高的音调估计器和一个长期音调分析器,以根据每个N最佳理论的音节来分段估计音高。 长期音调分析仪对分段估计音高进行长期色调分析,并产生长期色调置信度信号。 信心分数增强器接收初始置信度分数和长期音调信号,根据相应的长期音调信号信号修改每个初始置信度分数,根据增强的置信度得分重新排列N最佳理论; 输出N最好的理论。

    Speaker adaptation based on lateral tying for large-vocabulary
continuous speech recognition
    5.
    发明授权
    Speaker adaptation based on lateral tying for large-vocabulary continuous speech recognition 失效
    基于横向绑定的大词汇连续语音识别的演讲者适应

    公开(公告)号:US5737487A

    公开(公告)日:1998-04-07

    申请号:US600859

    申请日:1996-02-13

    IPC分类号: G10L15/06 G10L5/06

    CPC分类号: G10L15/065

    摘要: A system and method for performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers. The speech data is represented by a plurality of acoustic models and corresponding sub-events, and each sub-event includes one or more observations of speech data. A degree of lateral tying is computed between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events. When adaptation data from a new speaker becomes available, a new observation from adaptation data is assigned to one of the sub-events. Each of the sub-events is then populated with the observations contained in the assigned sub-event based on the degree of lateral tying that was computed between each pair of sub-events. The reference models corresponding to the populated sub-events are then adapted to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system.

    摘要翻译: 一种用于在语音识别系统中执行说话者适应的系统和方法,该系统和方法包括对应于来自多个扬声器的语音数据的一组参考模型。 语音数据由多个声学模型和相应的子事件表示,并且每个子事件包括语音数据的一个或多个观察结果。 在每对子事件之间计算横向绑定的程度,其中绑定度表示第一子事件中的第一观察对其余子事件有贡献的程度。 当来自新的说话者的自适应数据变得可用时,从适配数据中的新的观察被分配给一个子事件。 然后基于在每对子事件之间计算的横向绑定的程度,将包含在所分配的子事件中的观察值填充每个子事件。 然后,对应于填充的子事件的参考模型被调整以考虑新说话者的语音模式特征,从而降低语音识别系统的错误率。

    Rapid tree-based method for vector quantization
    6.
    发明授权
    Rapid tree-based method for vector quantization 失效
    用于矢量量化的快速基于树的方法

    公开(公告)号:US5734791A

    公开(公告)日:1998-03-31

    申请号:US999354

    申请日:1992-12-31

    IPC分类号: G10L19/02 G10L3/02

    CPC分类号: G10L19/038

    摘要: The branching decision for each node in a vector quantization (VQ) binary tree is made by a simple comparison of a pre-selected element of the candidate vector with a stored threshold resulting in a binary decision for reaching the next lower level. Each node has a preassigned element and threshold value. Conventional centroid distance training techniques (such as LBG and k-means) are used to establish code-book indices corresponding to a set of VQ centroids. The set of training vectors are used a second time to select a vector element and threshold value at each node that approximately splits the data evenly. After processing the training vectors through the binary tree using threshold decisions, a histogram is generated for each code-book index that represents the number of times a training vector belonging to a given index set appeared at each index. The final quantization is accomplished by processing and then selecting the nearest centroid belonging to that histogram. Accuracy comparable to that achieved by conventional binary tree VQ is realized but with almost a full magnitude increase in processing speed.

    摘要翻译: 矢量量化(VQ)二叉树中的每个节点的分支决定是通过将​​候选矢量的预先选择的元素与存储的阈值进行简单比较而得到的,从而产生用于达到下一较低级别的二进制决定。 每个节点具有预分配的元素和阈值。 传统的质心距离训练技术(如LBG和k-means)用于建立与一组VQ质心相对应的代码簿索引。 训练矢量集合被用于第二次在每个节点选择一个向量元素和阈值,每个节点大致分割数据。 在通过使用阈值判定的二进制树处理训练向量之后,针对代表每个索引处出现的给定索引集的训练向量的次数的每个代码簿索引生成直方图。 最后量化通过处理然后选择属于该直方图的最近质心来实现。 实现与常规二叉树VQ实现的精度相当的精度,但处理速度几乎提高了一个全面的幅度。

    Handwriting signal processing front-end for handwriting recognizers
    7.
    发明授权
    Handwriting signal processing front-end for handwriting recognizers 失效
    手写信号处理前端用于手写识别

    公开(公告)号:US5577135A

    公开(公告)日:1996-11-19

    申请号:US204031

    申请日:1994-03-01

    CPC分类号: G06K9/00422 G06K9/6218

    摘要: A handwriting signal processing front-end method and apparatus for a handwriting training and recognition system which includes non-uniform segmentation and feature extraction in combination with multiple vector quantization. In a training phase, digitized handwriting samples are partitioned into segments of unequal length. Features are extracted from the segments and are grouped to form feature vectors for each segment. Groups of adjacent from feature vectors are then combined to form input frames. Feature-specific vectors are formed by grouping features of the same type from each of the feature vectors within a frame. Multiple vector quantization is then performed on each feature-specific vector to statistically model the distributions of the vectors for each feature by identifying clusters of the vectors and determining the mean locations of the vectors in the clusters. Each mean location is represented by a codebook symbol and this information is stored in a codebook for each feature. These codebooks are then used to train a recognition system. In the testing phase, where the recognition system is to identify handwriting, digitized test handwriting is first processed as in the training phase to generate feature-specific vectors from input frames. Multiple vector quantization is then performed on each feature-specific vector to represent the feature-specific vector using the codebook symbols that were generated for that feature during training. The resulting series of codebook symbols effects a reduced representation of the sampled handwriting data and is used for subsequent handwriting recognition.

    摘要翻译: 一种用于手写训练和识别系统的手写信号处理前端方法和装置,其包括与多个矢量量化相结合的非均匀分割和特征提取。 在训练阶段,数字化手写样本被划分成不等长的段。 从段中提取特征,并将其分组以形成每个段的特征向量。 然后组合来自特征向量的相邻组以形成输入帧。 特征向量通过从帧内的每个特征向量分组相同类型的特征来形成。 然后对每个特征向量执行多向量量化,以通过识别向量的簇并确定簇中的向量的平均位置来统计地对每个特征的向量的分布进行建模。 每个平均位置由码本符号表示,并且该信息存储在每个特征的码本中。 然后将这些码本用于训练识别系统。 在识别系统识别笔迹的测试阶段,数字化测试笔迹首先在训练阶段进行处理,以从输入框中生成特征向量。 然后对每个特征向量执行多向量量化,以使用在训练期间为该特征生成的码本符号来表示特征向量。 所得到的一系列码本符号影响了采样笔迹数据的缩小表示,并被用于随后的手写识别。

    Automatic method for scoring and clustering prototypes of handwritten
stroke-based data
    8.
    发明授权
    Automatic method for scoring and clustering prototypes of handwritten stroke-based data 失效
    自动方法,用于评分和聚类基于手写笔划数据的原型

    公开(公告)号:US6052481A

    公开(公告)日:2000-04-18

    申请号:US300426

    申请日:1994-09-02

    CPC分类号: G06K9/68 G06K9/222

    摘要: A system and method for processing stroke-based handwriting data for the purposes of automatically scoring and clustering the handwritten data to form letter prototypes. The present invention includes a method for processing digitized stroke-based handwriting data of known character strings, where each of the character strings is represented by a plurality of mathematical feature vectors. In this method, each one of the plurality of feature vectors is labelled as corresponding to a particular character in the character strings. A trajectory is then formed for each one of the plurality of feature vectors labelled as corresponding to a particular character. After the trajectories are formed, a distance value is calculated for each pair of trajectories corresponding to the particular character using dynamic time warping method. The trajectories which are within a sufficiently small distance of each other are grouped to form a plurality of clusters. The clusters are used to define handwriting prototypes which identify subcategories of the character.

    摘要翻译: 一种用于处理基于笔画的手写数据的系统和方法,用于自动评分和聚集手写数据以形成信函原型。 本发明包括一种用于处理已知字符串的数字化的基于行程的手写数据的方法,其中每个字符串由多个数学特征向量表示。 在该方法中,将多个特征向量中的每一个标记为对应于字符串中的特定字符。 然后,为标记为对应于特定字符的多个特征向量中的每一个形成轨迹。 在形成轨迹之后,使用动态时间扭曲方法计算对应于特定角色的每对轨迹的距离值。 彼此之间足够小的距离内的轨迹被分组以形成多个簇。 集群用于定义识别字符子类别的手写原型。

    Method and apparatus for detecting end points of speech activity
    9.
    发明授权
    Method and apparatus for detecting end points of speech activity 失效
    用于检测语音活动终点的方法和装置

    公开(公告)号:US5692104A

    公开(公告)日:1997-11-25

    申请号:US313430

    申请日:1994-09-27

    IPC分类号: G10L11/02 G10L5/06

    CPC分类号: G10L25/87 G10L25/09 G10L25/24

    摘要: A method and apparatus for detecting end points of speech activity in an input signal using spectral representation vectors performs beginning point detection using spectral representation vectors for the spectrum of each sample of the input signal and a spectral representation vector for the steady state portion of the input signal. The beginning point of speech is detected when the spectrum diverges from the steady state portion of the input signal. Once the beginning point has been detected, the spectral representation vectors of the input signal are used to determine the ending point of the sound in the signal. The ending point of speech is detected when the spectrum converges towards the steady state portion of the input signal. After both the beginning and ending of the sound are detected, vector quantization distortion can be used to classify the sound as speech or noise.

    摘要翻译: 用于使用频谱表示向量检测输入信号中的终端语音活动的方法和装置使用输入信号的每个采样的频谱的频谱表示向量和输入的稳态部分的频谱表示向量来执行起始点检测 信号。 当频谱从输入信号的稳态部分发散时,检测起始点。 一旦检测到起始点,则使用输入信号的频谱表示向量来确定信号中的声音的终点。 当频谱收敛到输入信号的稳态部分时,检测终点语音。 在检测到声音的开始和结束之后,可以使用矢量量化失真将声音分类为语音或噪声。

    Method and apparatus for detecting speech activity using cepstrum vectors
    10.
    发明授权
    Method and apparatus for detecting speech activity using cepstrum vectors 失效
    使用倒谱矢量检测语音活动的方法和装置

    公开(公告)号:US5596680A

    公开(公告)日:1997-01-21

    申请号:US999128

    申请日:1992-12-31

    IPC分类号: G10L11/02 G10L5/06 G10L9/00

    CPC分类号: G10L25/87 G10L25/09 G10L25/24

    摘要: A method and apparatus for detecting speech activity in an input signal. The present invention includes performing begin point detection using power/zero crossing. Once the begin point has been detected, the present invention uses the cepstrum of the input signal to determine the endpoint of the sound in the signal. After both the beginning and ending of the sound are detected, the present invention uses vector quantization distortion to classify the sound as speech or noise.

    摘要翻译: 一种用于检测输入信号中的语音活动的方法和装置。 本发明包括使用电源/零交叉进行开始点检测。 一旦检测到起点,本发明使用输入信号的倒频谱来确定信号中的声音的端点。 在检测到声音的开始和结束之后,本发明使用矢量量化失真将声音分类为语音或噪声。