Method for creating and using multiple-word sound models in speech
recognition
    1.
    发明授权
    Method for creating and using multiple-word sound models in speech recognition 失效
    在语音识别中创建和使用多个字的声音模型的方法

    公开(公告)号:US4837831A

    公开(公告)日:1989-06-06

    申请号:US919885

    申请日:1986-10-15

    IPC分类号: G10L15/06

    CPC分类号: G10L15/063

    摘要: A first speech recognition method receives an acoustic description of an utterance to be recognized and scores a portion of that description against each of a plurality of cluster models representing similar sounds from different words. The resulting score for each cluster is used to calculate a word score for each word represented by that cluster. Preferably these word scores are used to prefilter vocabulary words, and the description of the utterance includes a succession of acoustic decriptions which are compared by linear time alignment against a succession of acoustic models. A second speech recognition method is also provided which matches an acoustic model with each of a succession of acoustic descriptions of an utterance to be recognized. Each of these models has a probability score for each vocabulary word. The probability scores for each word associated with the matching acoustic models are combined to form a total score for that word. The preferred speech recognition method calculates to separate word scores for each currently active vocabulary word from a common succession of sounds. Preferably the first scores is calculated by a time alignment method, while the second score is calculated by a time independent method. Preferably this calculation of two separate word scores is used in one of multiple word-selecting phase of a recognition process, such as in the prefiltering phase.

    摘要翻译: 第一语音识别方法接收要识别的话语的声学描述,并且针对表示来自不同单词的类似声音的多个群集模型中的每一个分类该描述的一部分。 每个群集的结果得分用于计算由该群组表示的每个单词的单词得分。 优选地,这些单词分数用于预先滤除词汇单词,并且话语的描述包括通过线性时间对齐与一系列声学模型进行比较的一系列声学评论。 还提供了第二语音识别方法,其将声学模型与要被识别的话语的一系列声学描述中的每一个相匹配。 这些模型中的每一个都具有每个词汇单词的概率分数。 将与匹配的声学模型相关联的每个单词的概率分数组合以形成该单词的总分。 优选的语音识别方法计算用于将每个当前活跃的词汇单词的词分数从普通连续的声音中分离出来。 优选地,通过时间对准方法计算第一分数,而通过时间独立方法计算第二分数。 优选地,两个单独的单词分数的计算用于识别过程的多个字选择阶段之一,例如在预过滤阶段。

    Method for speech analysis and speech recognition
    2.
    发明授权
    Method for speech analysis and speech recognition 失效
    语音分析和语音识别方法

    公开(公告)号:US4805218A

    公开(公告)日:1989-02-14

    申请号:US34842

    申请日:1987-04-03

    IPC分类号: G10L15/00 G10L1/00

    CPC分类号: G10L15/00

    摘要: A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.

    摘要翻译: 语音分析的方法针对每个声学帧序列计算一个或多个差分参数,其中每个差分参数是一个帧中的声学参数与附近帧中的声学参数之间的差的函数。 该方法用于语音识别,其将每个帧的差分参数与表示语音单元的声学模型进行比较,其中每个语音单元模型具有与其语音单元的帧相关联的差异参数的模型。 差分参数可以是斜率参数或能量差参数。 通过找到给定帧的给定频谱参数的能量与附近帧中与不同频带相关联的频谱参数的能量之间的差异来导出斜率参数。 所得到的参数表示由给定参数表示的频谱部分中的能量频率正在上升或下降的程度。 根据一帧中给定的频谱参数与表示相同频段的附近帧中的频谱参数之间的差值,计算能量差参数。 在本发明的一个实施例中,动态规划将要识别的帧序列的差参数与与多个语音单元模型中的每一个相关联的动态编程元件的序列进行比较。 在本发明的另一实施例中,每个语音单元模型表示一个音素,并且将多个音素的语音单元模型与各个帧进行比较,以将每个这样的帧与每个这样的帧相关联,其一个或多个音素的模型与 它。

    Method for representing word models for use in speech recognition
    3.
    发明授权
    Method for representing word models for use in speech recognition 失效
    用于表示用于语音识别的单词模型的方法

    公开(公告)号:US4903305A

    公开(公告)日:1990-02-20

    申请号:US328738

    申请日:1989-03-23

    IPC分类号: G10L15/06 G10L15/14

    摘要: A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.

    摘要翻译: 提供了一种用于导出用于语音识别的声学词表示的方法。 创建初始词模型,每个模型由一系列声学子模型组成。 来自多个单词模型的声学子模型被聚类,以便使用例如Kullback-Leibler信息作为相似度的度量来将来自不同单词的声学上相似的子模型分组。 然后,每个单词都是用聚类拼写表示的,表示聚类中其声学子模型放置的聚类。 通过将要识别的来自语音的帧的序列与与单个词模型的群集拼写的群集相关联的声学模型的序列进行比较来执行语音识别。 本发明还提供了一种用于导出单词表示的方法,该方法涉及用于接收单词的第一组帧序列,使用动态规划来导出独立于任何先前导出的任何声学模型特定的单词的概率声学子模型的对应的初始序列 使用动态规划来将该单词的第二组帧序列中的每一个时间对齐到与模型的初始序列相对应的一系列新子序列中,并且使用这些新的子序列来计算新的概率子序列, 楷模。

    Speech recognition apparatus and method
    4.
    发明授权
    Speech recognition apparatus and method 失效
    语音识别装置及方法

    公开(公告)号:US4783803A

    公开(公告)日:1988-11-08

    申请号:US797249

    申请日:1985-11-12

    IPC分类号: G10L15/00 G10L1/00

    CPC分类号: G10L15/00

    摘要: A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system's best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.

    摘要翻译: 公开了一种系统,用于在先前识别的一个或多个其他模式的上下文的情况下识别数据集合中的模式。 优选地,该系统是语音识别系统,该模式是单词,并且数据的收集是一系列声学帧。 在处理多个帧中的每一个帧期间,对于活跃词汇表中的每个单词,系统更新表示单词和框架之间的匹配概率的似然度分数,将基于一个或多个先前识别的语言模型得分组合起来 具有该可能性分数的单词,并且如果组合分数低于阈值,则从活动词汇表中修剪单词。 在帧和初始词汇的每个单词之间进行快速匹配,以确定最初应该在活动词汇表中放置哪些单词。 优选地,该系统使得操作者能够仅通过说另一个词来确认系统对于口语的最佳猜测,以通过键入与该猜测相关联的键来指示系统的替代猜测是正确的,并且指示最佳 通过键入另一个键来猜测也不会有其他猜测是正确的。 该系统包括其他特征,包括用于确定帧之间寻找语音开始的特征,以及用于计算可能性分数的特殊硬件处理器。

    Method for speech recognition
    5.
    发明授权
    Method for speech recognition 失效
    语音识别方法

    公开(公告)号:US4805219A

    公开(公告)日:1989-02-14

    申请号:US035628

    申请日:1987-04-03

    IPC分类号: G10L15/00 G10L5/00

    CPC分类号: G10L15/00

    摘要: A method determines if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model. This compensates for speech variation between the speech and the pattern models. The method then compares the resulting time-aligned speech model against the resulting time-aligned pattern models to determine which of the patterns most probably corresponds to the speech. Preferably there are a plurality of time-aligning models, each representing a group of somewhat similar sound sequences which occur in different words. Each of these time-aligning models is scored for similarity against a portion of speech, and the time-aligned speech model and time-aligned pattern models produced by time alignment with the best scoring time-aligning model are compared to determine the likelihood that each speech pattern corresponds to the portion of speech. This is performed for each successive portion of speech. When a portion of speech appears to correspond to a given speech pattern model, a range of likely start times is calculated for the vocabulary word associated with that model, and a word score is calculated to indicate the likelihood of that word starting in that range. The method uses a more computationally intensive comparison between the speech and selected vocabulary words, so as to more accurately determine which words correspond with which portions of the speech. When this more intensive comparison indicates the ending of a word at a given point in the speech, the method selects the best scoring vocabulary words whose range of start times overlaps that ending time, and performs the computationally intensive comparison on those selected words starting at that point in the speech.

    摘要翻译: 方法通过相对于公共时间对准模型对语音和多个语音模式模型进行时间对准来确定语音的一部分是否对应于语音模式。 这补偿了语音与模式模式之间的语音变化。 然后,该方法将所得到的时间对齐语音模型与所得到的时间对齐模式模型进行比较,以确定哪些模式最可能对应于语音。 优选地,存在多个时间对准模型,每个时间对准模型表示以不同的单词发生的一些稍微相似的声音序列的组。 对这些时间对准模型中的每一个进行与一部分语音的相似性的评分,并且将与时间对齐产生的时间对齐语音模型和时间对齐模式模型与最佳评分时间对齐模型进行比较,以确定每个 语音模式对应于语音部分。 这是为每个连续的语音部分执行的。 当一部分语音似乎对应于给定的语音模式模型时,针对与该模型相关联的词汇单词计算可能的开始时间的范围,并且计算单词分数以指示在该范围内开始该单词的可能性。 该方法在语音和所选择的词汇单词之间使用更加计算密集的比较,以便更准确地确定哪些词对应于语音的哪个部分。 当这种更加密集的比较表明语音中给定点处的单词的结束时,该方法选择开始时间范围与该结束时间重叠的最佳得分词汇单词,并且对那些从那开始的那些选择的单词执行计算密集比较 在演讲中指出。

    Robust pattern recognition system and method using Socratic agents

    公开(公告)号:US08331656B2

    公开(公告)日:2012-12-11

    申请号:US13446942

    申请日:2012-04-13

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G06K9/62 G06F17/00 G06N5/00

    CPC分类号: G06K9/6256 G06K9/6262

    摘要: A computer-implemented pattern recognition method, system and program product, the method comprising in one embodiment: creating electronically a linkage between a plurality of models within a classifier module within a pattern recognition system such that any one of said plurality of models may be selected as an active model in a recognition process; creating electronically a null hypothesis between at least one model of said plurality of linked models and at least a second model among said plurality of linked models; accumulating electronically evidence to accept or reject said null hypothesis until sufficient evidence is accumulated to reject said null hypothesis in favor of one of said plurality of linked models or until a stopping criterion is met; and transmitting at least a portion of the electronically accumulated evidence or a summary thereof to accept or reject said null hypothesis to a pattern classifier module.

    Systems and methods for word recognition
    8.
    发明授权
    Systems and methods for word recognition 失效
    词识别的系统和方法

    公开(公告)号:US5680511A

    公开(公告)日:1997-10-21

    申请号:US477287

    申请日:1995-06-07

    IPC分类号: G10L15/18 G10L9/00

    CPC分类号: G10L15/1815

    摘要: In one aspect, the invention provides word recognition systems that operate to recognize an unrecognized or ambiguous word that occurs within a passage of words. The system can offer several words as choice words for inserting into the passage to replace the unrecognized word. The system can select the best choice word by using the choice word to extract from a reference source, sample passages of text that relate to the choice word. For example, the system can select the dictionary passage that defines the choice word. The system then compares the selected passage to the current passage, and generates a score that indicates the likelihood that the choice word would occur within that passage of text. The system can select the choice word with the best score to substitute into the passage. The passage of words being analyzed can be any word sequence including an utterance, a portion of handwritten text, a portion of typewritten text or other such sequence of words, numbers and characters. Alternative embodiments of the present invention are disclosed which function to retrieve documents from a library as a function of context.

    摘要翻译: 在一个方面,本发明提供了操作以识别在单词通过内出现的未识别或不明确的单词的单词识别系统。 该系统可以提供多个单词作为选择单词,用于插入到段落中以替换未被识别的单词。 系统可以通过使用选择单词从参考源中提取出最佳选择单词,与选择单词相关的文本的样本段落。 例如,系统可以选择定义选择字的字典通道。 然后,系统将所选择的段落与当前段落进行比较,并生成一个分数,指示选择单词在文本段落内发生的可能性。 系统可以选择具有最佳分数的选择词来代替段落。 正在分析的单词的通过可以是包括发音,手写文本的一部分,打字文本的一部分或其他这样的单词,数字和字符序列的任何单词序列。 公开了本发明的替代实施例,其功能是根据上下文从库中检索文档。

    Speech recognition method
    9.
    发明授权

    公开(公告)号:US4803729A

    公开(公告)日:1989-02-07

    申请号:US34843

    申请日:1987-04-03

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G10L15/04 G10L5/00

    CPC分类号: G10L15/04

    摘要: Smoothed frame labeling associates phonetic frame labels with a given speech frame as a function of (a) the closeness with which the given frame compares to each of a plurality of acoustic models, (b) which frame labels correspond with a neighboring frame, and (c) transition probabilities which indicate, for the frame labels associated with the neighboring frame, which frame labels are probably associated with the given frame. The smoothed frame labeling is used to divide the speech into segments of frames having the same class of labels. The invention represents words as a collection of known diphone models, each of which models the sound before and after a boundary between segments derived by the smoothed frame labeling. At recognition time, the speech is divided into segments by smoothed frame labeling; diphone models are derived for each boundary between the resulting segments; and the resulting diphone models are compared against the known diphone models to determine which of the known diphone models match the segment boundaries in the speech. Then a combined-displaced-evidence method is used to determine which words occur in the speech. This method detects which acoustic patterns, in the form of the known diphone models, match various portions of the speech. In response to each such match, it associates with the speech an evidence score for each vocabulary word in which that pattern is known to occur. It displaces each such score from the location of its associated matched pattern by the known distance between that pattern and the beginning of the score's word. Then all the evidence scores for a word located in a given portion of the speech are combined to produce a score which indicates the probability of that word starting in that portion of the speech. This score is combined with a score produced by comparing a histogram from a portion of the speech against a histogram of each word. The resulting combined score determines whether a given word should undergo a more detailed comparison against the speech to be recognized.

    Robust pattern recognition system and method using socratic agents
    10.
    发明授权
    Robust pattern recognition system and method using socratic agents 有权
    强大的图案识别系统和方法,使用沉重的代理

    公开(公告)号:US08014591B2

    公开(公告)日:2011-09-06

    申请号:US11898636

    申请日:2007-09-13

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G06K9/62 G06F17/00 G06N5/00

    CPC分类号: G06K9/6256 G06K9/6262

    摘要: A computer-implemented pattern recognition method, system and program product, the method comprising in one embodiment: creating electronically a linkage between a plurality of models within a classifier module within a pattern recognition system such that any one of said plurality of models may be selected as an active model in a recognition process; creating electronically a null hypothesis between at least one model of said plurality of linked models and at least a second model among said plurality of linked models; accumulating electronically evidence to accept or reject said null hypothesis until sufficient evidence is accumulated to reject said null hypothesis in favor of one of said plurality of linked models or until a stopping criterion is met; and transmitting at least a portion of the electronically accumulated evidence or a summary thereof to accept or reject said null hypothesis to a pattern classifier module.

    摘要翻译: 一种计算机实现的模式识别方法,系统和程序产品,所述方法包括在一个实施例中:电子地创建模式识别系统内的分类器模块内的多个模型之间的链接,使得可以选择所述多个模型中的任何一个 作为识别过程中的积极模式; 在所述多个链接模型的至少一个模型和所述多个链接模型中的至少第二模型之间以电子方式创建零假设; 积累电子证据以接受或拒绝所述零假设,直到足够的证据被累积以拒绝所述零假设以有利于所述多个链接模型中的一个或直到满足停止标准为止; 以及传送所述电子累积证据的至少一部分或其摘要,以将所述零假设接受或拒绝给模式分类器模块。