Method and apparatus for voice-interactive language instruction
    1.
    发明授权
    Method and apparatus for voice-interactive language instruction 失效
    语音交互语言指令的方法和装置

    公开(公告)号:US5634086A

    公开(公告)日:1997-05-27

    申请号:US529376

    申请日:1995-09-18

    摘要: Spoken-language instruction method and apparatus employ context-based speech recognition for instruction and evaluation, particularly language instruction and language fluency evaluation. A system can administer a lesson, and particularly a language lesson, and evaluate performance in a natural interactive manner while tolerating strong foreign accents, and produce as an output a reading quality score. A finite state grammar set corresponding to the range of word sequence patterns in the lesson is employed as a constraint on a hidden Markov model (HMM) search apparatus in an HMM speech recognizer which includes a set of hidden Markov models of target-language narrations produced by native speakers of the target language. The invention is preferably based on use of a linguistic context-sensitive speech recognizer. The invention includes a system with an interactive decision mechanism which employs at least three levels of error tolerance to simulate a natural level of patience in human-based interactive instruction. A system for a reading phase is implemented through a finite state machine having at least four states which recognizes reading error at any position in a script and which employs a first set of actions. A related system for an interactive question phase is implemented through a finite state machine, but which recognizes reading errors as well as incorrect answers while invoking a second set of actions. A linguistically-sensitive utterance endpoint detector is provided for judging termination of a spoken utterance to simulate human turn-taking in conversational speech.

    摘要翻译: 语言指导方法和装置采用基于语境的语音识别来进行指导和评估,特别是语言指导和语言流畅性评估。 系统可以管理课程,特别是语言课程,并以自然的交互方式评估表现,同时容忍强大的外国口音,并产生读数质量得分。 对应于课程中的单词序列模式的范围的有限状态语法集合被用作HMM语音识别器中的隐马尔可夫模型(HMM)搜索装置的约束,其包括产生的目标语言叙述的一组隐马尔可夫模型 以母语为母语的目标语言。 本发明优选地基于使用语言上下文敏感语音识别器。 本发明包括具有交互式决策机制的系统,其采用至少三个误差容限级别来模拟基于人的交互式指令的自然级别的耐心。 用于读取阶段的系统通过具有至少四个状态的有限状态机来实现,该状态识别脚本中任何位置处的读取错误并且采用第一组动作。 用于交互式问题阶段的相关系统通过有限状态机实现,但是在调用第二组动作时识别读取错误以及不正确的答案。 提供语言敏感的话语端点检测器,用于判断语音话语的终止以模拟会话语音中的人转向。

    Method of dynamically altering grammars in a memory efficient speech recognition system
    2.
    发明授权
    Method of dynamically altering grammars in a memory efficient speech recognition system 有权
    在存储器高效语音识别系统中动态地改变语法的方法

    公开(公告)号:US07324945B2

    公开(公告)日:2008-01-29

    申请号:US09894898

    申请日:2001-06-28

    IPC分类号: G10L15/18

    CPC分类号: G10L15/19 G10L15/285

    摘要: A method of speech recognition that uses hierarchical data structures that include a top level grammar and various related subgrammars, such as word, phone, and state subgrammars. A speech signal is acquired, and a probabilistic search is performed using the speech signal as an input, and using the (unexpanded) grammars and subgrammars as possible inputs. Memory is allocated to a subgrammar when a transition to that subgrammar is made during the probabilistic search. The subgrammar may then be expanded and evaluated, and the probability of a match between the speech signal and an element of the subgrammar for which memory has been allocated may be computed. Because unexpanded grammars and subgrammars take up very little memory, this method enables systems to recognize and process a larger vocabulary that would otherwise be possible. This method also permits grammars and subgrammars to be added, deleted, or selected by a remote computer while the speech recognition system is operating, allowing speech recognition systems to have a nearly unlimited vocabulary.

    摘要翻译: 一种语音识别的方法,其使用包括顶级语法和各种相关子程序的分层数据结构,例如单词,电话和状态子程序。 获取语​​音信号,并使用语音信号作为输入,并使用(未展开的)语法和子程序作为可能的输入来执行概率搜索。 在概率搜索期间过渡到该子程序时,内存被分配给子程序。 然后可以扩展和评估子程序,并且可以计算语音信号和已经分配了存储器的子程序的元素之间的匹配概率。 因为未扩展的语法和子程序占用很少的内存,所以这种方法使系统能够识别和处理否则可能的更大的词汇。 该方法还允许在语音识别系统运行时由远程计算机添加,删除或选择语法和子程序,从而允许语音识别系统具有几乎无限的词汇。

    Speaker adaptation based on lateral tying for large-vocabulary
continuous speech recognition
    3.
    发明授权
    Speaker adaptation based on lateral tying for large-vocabulary continuous speech recognition 失效
    基于横向绑定的大词汇连续语音识别的演讲者适应

    公开(公告)号:US5737487A

    公开(公告)日:1998-04-07

    申请号:US600859

    申请日:1996-02-13

    IPC分类号: G10L15/06 G10L5/06

    CPC分类号: G10L15/065

    摘要: A system and method for performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers. The speech data is represented by a plurality of acoustic models and corresponding sub-events, and each sub-event includes one or more observations of speech data. A degree of lateral tying is computed between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events. When adaptation data from a new speaker becomes available, a new observation from adaptation data is assigned to one of the sub-events. Each of the sub-events is then populated with the observations contained in the assigned sub-event based on the degree of lateral tying that was computed between each pair of sub-events. The reference models corresponding to the populated sub-events are then adapted to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system.

    摘要翻译: 一种用于在语音识别系统中执行说话者适应的系统和方法,该系统和方法包括对应于来自多个扬声器的语音数据的一组参考模型。 语音数据由多个声学模型和相应的子事件表示,并且每个子事件包括语音数据的一个或多个观察结果。 在每对子事件之间计算横向绑定的程度,其中绑定度表示第一子事件中的第一观察对其余子事件有贡献的程度。 当来自新的说话者的自适应数据变得可用时,从适配数据中的新的观察被分配给一个子事件。 然后基于在每对子事件之间计算的横向绑定的程度,将包含在所分配的子事件中的观察值填充每个子事件。 然后,对应于填充的子事件的参考模型被调整以考虑新说话者的语音模式特征,从而降低语音识别系统的错误率。