Apparatus and methods for training speech recognition systems and their
users and otherwise improving speech recognition performance
    1.
    发明授权
    Apparatus and methods for training speech recognition systems and their users and otherwise improving speech recognition performance 失效
    用于训练语音识别系统及其用户的装置和方法,以及改善语音识别性能

    公开(公告)号:US5428707A

    公开(公告)日:1995-06-27

    申请号:US976413

    申请日:1992-11-13

    摘要: A tutorial instructs how to use a word recognition system, such as one for speech recognition. It specifies a set of allowed response words for each of a plurality of states. It sends messages on how to use the recognizer in certain states, and, in others, presents exercises in which the user is to enter signals representing expected words. It scores each such signal against word models to select which response word corresponds to it, and then advances to a state associated with that selected response. This scoring is performed against a large vocabulary even though only a small number of responses are allowed, and the signal is rejected if too many non-allowed words score better than any allowed word. The system comes with multiple sets of standard signal models; it scores each against a given user's signals, selects the set which scores best, and then performs adaptive and batch training upon that set. Preferably, the tutorial prompts users to enter the words used for training in an environment similar to that of the actual recognizer the tutorial is training them to use. The system will normally simulate the recognition of the prompted word, but will sometimes it will simulate an error. When it does, notifies the user if he fails to correct the error. The recognizer associated with the tutorial allows users to perform adaptive training either on all words, or only on those whose recognition has been corrected or confirmed. The recognizer also uses a context language model which indicates the probability that a given word will be used in the context of other words which precede it in a grouping of text.

    摘要翻译: 教程指导如何使用单词识别系统,如用于语音识别的系统。 它为多个状态中的每一个指定一组允许的响应字。 它在某些状态下发送关于如何使用识别器的消息,而在另一些状态下,呈现用户将要输入表示预期字的信号的练习。 它将每个这样的信号与单词模型分值以选择哪个响应字对应于它,然后前进到与所选择的响应相关联的状态。 即使只允许少量答复,这种评分也是针对较大的词汇进行的,并且如果太多的不允许的词得分比任何允许的词更好,则该信号被拒绝。 该系统配有多套标准信号模型; 它根据给定的用户信号进行分数,选择最佳分数的集合,然后对该集合进行自适应和批量训练。 优选地,该教程提示用户在类似于实际识别器的环境中输入用于训练的单词,该教程正在培训他们使用。 系统通常会模拟提示词的识别,但有时会模拟错误。 当这样做时,通知用户他是否无法纠正错误。 与本教程相关联的识别器允许用户以所有单词进行适应性训练,或仅对那些已被校正或确认的人进行适应性训练。 识别器还使用上下文语言模型,其指示在文本分组中在其之前的其它单词的上下文中使用给定单词的概率。

    Apparatuses and methods for training and operating speech recognition
systems
    2.
    发明授权
    Apparatuses and methods for training and operating speech recognition systems 失效
    用于训练和操作语音识别系统的装置和方法

    公开(公告)号:US5850627A

    公开(公告)日:1998-12-15

    申请号:US883297

    申请日:1997-06-26

    摘要: A word recognition system can: respond to the input of a character string from a user by limiting the words it will recognize to words having a related, but not necessarily the same, string; score signals generated after a user has been prompted to generate a given word against words other than the prompted word to determine if the signal should be used to train the prompted word; vary the number of signals a user is prompted to generate to train a given word as a function of how well the training signals score against each other or prior models for the prompted word; create a new acoustic model of a phrase by concatenating prior acoustic models of the words in the phrase; obtain information from another program running on the same computer, such as its commands or the context of text being entered into it, and use that information to vary which words it can recognize; determine which program unit, such as an application program or dialog box, currently has input focus on its computer and create a vocabulary state associated with that program unit into which vocabulary words which will be made active when that program group has the focus can be put; detect the available computational resources and alter the instructions it executes in response; test if its ability to respond to voice input has been shut off without user confirmation, and, if so, turn that ability back on and prompt the user to confirm if that ability is to be turned off; store both a first and a second set of models for individual vocabulary words and enable a user to selectively cause the recognizer to disregard the second set of models for a selected word; and/or score a signal representing a given word against models for that word from different word model sets to select which model should be used for future recognition.

    摘要翻译: 字识别系统可以:通过将其将识别的字限制为具有相关但不一定相同的字符串的字符来响应来自用户的字符串的输入; 在用户被提示以产生除提示词之外的单词产生给定单词以产生的得分信号,以确定该信号是否应用于训练所提示词; 改变用户被提示产生的信号的数量,以训练给定的单词,作为训练信号相对于提示的单词相对得分或先前的模型的程度的函数; 通过连接短语中的单词的先前的声学模型来创建短语的新的声学模型; 从在同一台计算机上运行的另一个程序获取信息,例如其命令或输入文本的上下文,并使用该信息来改变其可识别的单词; 确定哪个程序单元(例如应用程序或对话框)当前在其计算机上具有输入焦点,并创建与该程序单元相关联的词汇状态,当该程序组具有焦点时,可以使用将被激活的词汇单词 ; 检测可用的计算资源并改变其响应执行的指令; 测试其响应语音输入的能力是否已被关闭而无需用户确认,如果是,请重新启动该功能,并提示用户确认该功能是否被关闭; 存储用于单个词汇单词的第一和第二模型集合,并且使用户能够选择性地使识别器忽略所选择的单词的第二组模型; 和/或从不同的单词模型集合得到表示给定单词的信号与该单词的模型,以选择哪个模型应用于将来的识别。

    Apparatuses and methods for training and operating speech recognition
systems
    3.
    发明授权
    Apparatuses and methods for training and operating speech recognition systems 失效
    用于训练和操作语音识别系统的装置和方法

    公开(公告)号:US6101468A

    公开(公告)日:2000-08-08

    申请号:US882914

    申请日:1997-06-26

    摘要: A word recognition system can: respond to the input of a character string from a user by limiting the words it will recognize to words having a related, but not necessarily the same, string; score signals generated after a user has been prompted to generate a given word against words other than the prompted word to determine if the signal should be used to train the prompted word; vary the number of signals a user is prompted to generate to train a given word as a function of how well the training signals score against each other or prior models for the prompted word; create a new acoustic model of a phrase by concatenating prior acoustic models of the words in the phrase; obtain information from another program running on the same computer, such as its commands or the context of text being entered into it, and use that information to vary which words it can recognize; determine which program unit, such as an application program or dialog box, currently has input focus on its computer and create a vocabulary state associated with that program unit into which vocabulary words which will be made active when that program group has the focus can be put; detect the available computational resources and alter the instructions it executes in response; test if its ability to respond to voice input has been shut off without user confirmation, and, if so, turn that ability back on and prompt the user to confirm if that ability is to be turned off, store both a first and a second set of models for individual vocabulary words and enable a user to selectively cause the recognizer to disregard the second set of models for a selected word; and/or score a signal representing a given word against models for that word from different word model sets to select which model should be used for future recognition.

    摘要翻译: 字识别系统可以:通过将其将识别的字限制为具有相关但不一定相同的字符串的字符来响应来自用户的字符串的输入; 在用户被提示以产生除提示词之外的单词产生给定单词以产生的得分信号,以确定该信号是否应用于训练所提示词; 改变用户被提示产生的信号的数量,以训练给定的单词,作为训练信号相对于提示的单词相对得分或先前的模型的程度的函数; 通过连接短语中的单词的先前的声学模型来创建短语的新的声学模型; 从在同一台计算机上运行的另一个程序获取信息,例如其命令或输入文本的上下文,并使用该信息来改变其可识别的单词; 确定哪个程序单元(例如应用程序或对话框)当前在其计算机上具有输入焦点,并创建与该程序单元相关联的词汇状态,当该程序组具有焦点时,可以使用将被激活的词汇单词 ; 检测可用的计算资源并改变其响应执行的指令; 测试其响应语音输入的能力是否在没有用户确认的情况下被关闭,如果是,请重新启动该功能,并提示用户确认是否将该功能关闭,存储第一组和第二组 用于单个词汇单词的模型,并且使用户能够选择性地使识别器忽略所选择的单词的第二组模型; 和/或从不同的单词模型集合得到表示给定单词的信号与该单词的模型,以选择哪个模型应用于将来的识别。

    Speech recognition system which selects one of a plurality of vocabulary
models
    4.
    发明授权
    Speech recognition system which selects one of a plurality of vocabulary models 失效
    选择多个词汇模型中的一个的语音识别系统

    公开(公告)号:US6073097A

    公开(公告)日:2000-06-06

    申请号:US882811

    申请日:1997-06-26

    摘要: A word recognition system can: respond to the input of a character string from a user by limiting the words it will recognize to words having a related, but not necessarily the same, string; score signals generated after a user has been prompted to generate a given word against words other than the prompted word to determine if the signal should be used to train the prompted word; vary the number of signals a user is prompted to generate to train a given word as a function of how well the training signals score against each other or prior models for the prompted word; create a new acoustic model of a phrase by concatenating prior acoustic models of the words in the phrase; obtain information from another program running on the same computer, such as its commands or the context of text being entered into it, and use that information to vary which words it can recognize; determine which program unit, such as an application program or dialog box, currently has input focus on its computer and create a vocabulary state associated with that program unit into which vocabulary words which will be made active when that program group has the focus can be put; detect the available computational resources and alter the instructions it executes in response; test if its ability to respond to voice input has been shut off without user confirmation, and, if so, turn that ability back on and prompt the user to confirm if that ability is to be turned off; store both a first and a second set of models for individual vocabulary words and enable a user to selectively cause the recognizer to disregard the second set of models for a selected word; and/or score a signal representing a given word against models for that word from different word model sets to select which model should be used for future recognition.

    摘要翻译: 字识别系统可以:通过将其将识别的字限制为具有相关但不一定相同的字符串的字符来响应来自用户的字符串的输入; 在用户被提示以产生除提示词之外的单词产生给定单词以产生的得分信号,以确定该信号是否应用于训练所提示词; 改变用户被提示产生的信号的数量,以训练给定的单词,作为训练信号相对于提示的单词相对得分或先前的模型的程度的函数; 通过连接短语中的单词的先前的声学模型来创建短语的新的声学模型; 从在同一台计算机上运行的另一个程序获取信息,例如其命令或输入文本的上下文,并使用该信息来改变其可识别的单词; 确定哪个程序单元(例如应用程序或对话框)当前在其计算机上具有输入焦点,并创建与该程序单元相关联的词汇状态,当该程序组具有焦点时,可以使用将被激活的词汇单词 ; 检测可用的计算资源并改变其响应执行的指令; 测试其响应语音输入的能力是否已被关闭而无需用户确认,如果是,请重新启动该功能,并提示用户确认该功能是否被关闭; 存储用于单个词汇单词的第一和第二模型集合,并且使用户能够选择性地使识别器忽略所选择的单词的第二组模型; 和/或从不同的单词模型集合得到表示给定单词的信号与该单词的模型,以选择哪个模型应用于将来的识别。

    Robust pattern recognition system and method using socratic agents
    5.
    发明授权
    Robust pattern recognition system and method using socratic agents 有权
    强大的图案识别系统和方法,使用沉重的代理

    公开(公告)号:US08014591B2

    公开(公告)日:2011-09-06

    申请号:US11898636

    申请日:2007-09-13

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G06K9/62 G06F17/00 G06N5/00

    CPC分类号: G06K9/6256 G06K9/6262

    摘要: A computer-implemented pattern recognition method, system and program product, the method comprising in one embodiment: creating electronically a linkage between a plurality of models within a classifier module within a pattern recognition system such that any one of said plurality of models may be selected as an active model in a recognition process; creating electronically a null hypothesis between at least one model of said plurality of linked models and at least a second model among said plurality of linked models; accumulating electronically evidence to accept or reject said null hypothesis until sufficient evidence is accumulated to reject said null hypothesis in favor of one of said plurality of linked models or until a stopping criterion is met; and transmitting at least a portion of the electronically accumulated evidence or a summary thereof to accept or reject said null hypothesis to a pattern classifier module.

    摘要翻译: 一种计算机实现的模式识别方法,系统和程序产品,所述方法包括在一个实施例中:电子地创建模式识别系统内的分类器模块内的多个模型之间的链接,使得可以选择所述多个模型中的任何一个 作为识别过程中的积极模式; 在所述多个链接模型的至少一个模型和所述多个链接模型中的至少第二模型之间以电子方式创建零假设; 积累电子证据以接受或拒绝所述零假设,直到足够的证据被累积以拒绝所述零假设以有利于所述多个链接模型中的一个或直到满足停止标准为止; 以及传送所述电子累积证据的至少一部分或其摘要,以将所述零假设接受或拒绝给模式分类器模块。

    Assisted speech recognition by dual search acceleration technique
    6.
    发明授权
    Assisted speech recognition by dual search acceleration technique 有权
    辅助语音识别双重搜索加速技术

    公开(公告)号:US07031915B2

    公开(公告)日:2006-04-18

    申请号:US10348966

    申请日:2003-01-23

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G10L15/00 G10L15/12 G10L15/28

    CPC分类号: G10L15/08

    摘要: A speech recognition method, system and program product, the method in one embodiment comprising: obtaining input speech data; initiating a first speech recognition search process with at least one hypothesis; initiating a second speech recognition search process with a plurality of hypotheses; obtaining partial results from the second speech recognition search process, where the partial results include an evaluation of at least one hypothesis that the first speech recognition search process has not evaluated at this point in time; and utilizing the partial results to alter the first speech recognition search process.

    摘要翻译: 一种语音识别方法,系统和程序产品,一个实施例中的方法包括:获得输入语音数据; 发起具有至少一个假设的第一语音识别搜索过程; 发起具有多个假设的第二语音识别搜索过程; 从所述第二语音识别搜索过程获得部分结果,其中所述部分结果包括所述第一语音识别搜索处理在该时间点尚未评估的至少一个假设的评估; 并且利用部分结果来改变第一语音识别搜索过程。

    Speech recognition training method
    7.
    发明授权
    Speech recognition training method 失效
    语音识别训练方法

    公开(公告)号:US4718088A

    公开(公告)日:1988-01-05

    申请号:US593891

    申请日:1984-03-27

    摘要: A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.

    摘要翻译: 语音识别方法和装置采用语音处理电路,以帧重复率重复地从语音输入中导出多个声学参数。 声学参数表示帧时间的语音输入信号。 通过将声学参数与存储的模板图案进行比较,多个模板匹配和成本处理电路连同语音处理电路连接到用于确定或识别输入语音中的语音单元的系统总线。 可以通过向总线添加更多的模板匹配和成本处理电路来扩展该装置,从而增加装置的语音识别能力。 通过使用“小丑”字通过找到由沉默包围的话语的开始和结束来有助于指定孤立地说出的话语的时间边界。

    Robust pattern recognition system and method using Socratic agents

    公开(公告)号:US08331656B2

    公开(公告)日:2012-12-11

    申请号:US13446942

    申请日:2012-04-13

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G06K9/62 G06F17/00 G06N5/00

    CPC分类号: G06K9/6256 G06K9/6262

    摘要: A computer-implemented pattern recognition method, system and program product, the method comprising in one embodiment: creating electronically a linkage between a plurality of models within a classifier module within a pattern recognition system such that any one of said plurality of models may be selected as an active model in a recognition process; creating electronically a null hypothesis between at least one model of said plurality of linked models and at least a second model among said plurality of linked models; accumulating electronically evidence to accept or reject said null hypothesis until sufficient evidence is accumulated to reject said null hypothesis in favor of one of said plurality of linked models or until a stopping criterion is met; and transmitting at least a portion of the electronically accumulated evidence or a summary thereof to accept or reject said null hypothesis to a pattern classifier module.

    Systems and methods for word recognition
    9.
    发明授权
    Systems and methods for word recognition 失效
    词识别的系统和方法

    公开(公告)号:US5680511A

    公开(公告)日:1997-10-21

    申请号:US477287

    申请日:1995-06-07

    IPC分类号: G10L15/18 G10L9/00

    CPC分类号: G10L15/1815

    摘要: In one aspect, the invention provides word recognition systems that operate to recognize an unrecognized or ambiguous word that occurs within a passage of words. The system can offer several words as choice words for inserting into the passage to replace the unrecognized word. The system can select the best choice word by using the choice word to extract from a reference source, sample passages of text that relate to the choice word. For example, the system can select the dictionary passage that defines the choice word. The system then compares the selected passage to the current passage, and generates a score that indicates the likelihood that the choice word would occur within that passage of text. The system can select the choice word with the best score to substitute into the passage. The passage of words being analyzed can be any word sequence including an utterance, a portion of handwritten text, a portion of typewritten text or other such sequence of words, numbers and characters. Alternative embodiments of the present invention are disclosed which function to retrieve documents from a library as a function of context.

    摘要翻译: 在一个方面,本发明提供了操作以识别在单词通过内出现的未识别或不明确的单词的单词识别系统。 该系统可以提供多个单词作为选择单词,用于插入到段落中以替换未被识别的单词。 系统可以通过使用选择单词从参考源中提取出最佳选择单词,与选择单词相关的文本的样本段落。 例如,系统可以选择定义选择字的字典通道。 然后,系统将所选择的段落与当前段落进行比较,并生成一个分数,指示选择单词在文本段落内发生的可能性。 系统可以选择具有最佳分数的选择词来代替段落。 正在分析的单词的通过可以是包括发音,手写文本的一部分,打字文本的一部分或其他这样的单词,数字和字符序列的任何单词序列。 公开了本发明的替代实施例,其功能是根据上下文从库中检索文档。

    Speech recognition method
    10.
    发明授权

    公开(公告)号:US4803729A

    公开(公告)日:1989-02-07

    申请号:US34843

    申请日:1987-04-03

    申请人: James K. Baker

    发明人: James K. Baker

    IPC分类号: G10L15/04 G10L5/00

    CPC分类号: G10L15/04

    摘要: Smoothed frame labeling associates phonetic frame labels with a given speech frame as a function of (a) the closeness with which the given frame compares to each of a plurality of acoustic models, (b) which frame labels correspond with a neighboring frame, and (c) transition probabilities which indicate, for the frame labels associated with the neighboring frame, which frame labels are probably associated with the given frame. The smoothed frame labeling is used to divide the speech into segments of frames having the same class of labels. The invention represents words as a collection of known diphone models, each of which models the sound before and after a boundary between segments derived by the smoothed frame labeling. At recognition time, the speech is divided into segments by smoothed frame labeling; diphone models are derived for each boundary between the resulting segments; and the resulting diphone models are compared against the known diphone models to determine which of the known diphone models match the segment boundaries in the speech. Then a combined-displaced-evidence method is used to determine which words occur in the speech. This method detects which acoustic patterns, in the form of the known diphone models, match various portions of the speech. In response to each such match, it associates with the speech an evidence score for each vocabulary word in which that pattern is known to occur. It displaces each such score from the location of its associated matched pattern by the known distance between that pattern and the beginning of the score's word. Then all the evidence scores for a word located in a given portion of the speech are combined to produce a score which indicates the probability of that word starting in that portion of the speech. This score is combined with a score produced by comparing a histogram from a portion of the speech against a histogram of each word. The resulting combined score determines whether a given word should undergo a more detailed comparison against the speech to be recognized.