Apparatus and method for correcting the difference in frequency
characteristics between microphones for analyzing speech and for
creating a recognition dictionary
    1.
    发明授权
    Apparatus and method for correcting the difference in frequency characteristics between microphones for analyzing speech and for creating a recognition dictionary 失效
    用于校正用于分析语音和用于创建识别词典的麦克风之间的频率特性差异的装置和方法

    公开(公告)号:US6032115A

    公开(公告)日:2000-02-29

    申请号:US935082

    申请日:1997-09-26

    CPC分类号: G10L15/065 G10L15/20

    摘要: In sound recognition apparatus of the present invention, user's utterance or a sound provided by an output section using previously stored sound waveforms is simultaneously inputted through a basic microphone of known frequency characteristics and an input microphone of unknown frequency characteristics. An analysis section respectively analyzes the frequency of the input speech through the basic microphone and the input microphone. A frequency characteristics calculation section calculates first difference data between the frequencies of the input speech of the basic microphone and the input microphone, and calculates frequency characteristics of the input microphone according to the first difference data and the frequency characteristics of the basic microphone. A frequency characteristics correction section calculates second difference data between the frequency characteristics of the input microphone and known frequency characteristics of a dictionary data microphone, and corrects input speech to be recognized through the input microphone as speech data of the frequency characteristics of the dictionary data microphone according to the second difference data. A recognition section recognizes corrected speech data by referring to a recognition dictionary storing data previously created through the dictionary data microphone.

    摘要翻译: 在本发明的声音识别装置中,通过已知频率特性的基本麦克风和未知频率特性的输入麦克风同时输入使用先前存储的声音波形的输出部分提供的用户发声或声音。 分析部分分别通过基本麦克风和输入麦克风分析输入语音的频率。 频率特性计算部分计算基本麦克风和输入麦克风的输入语音的频率之间的第一差分数据,并根据第一差分数据和基本麦克风的频率特性来计算输入麦克风的频率特性。 频率特性校正部分计算输入麦克风的频率特性与字典数据麦克风的已知频率特性之间的第二差分数据,并通过输入麦克风校正要识别的输入语音作为词典数据麦克风的频率特性的语音数据 根据第二个差异数据。 识别部分通过参考存储先前通过字典数据麦克风创建的数据的识别字典识别校正的语音数据。

    Method of and apparatus for deriving a plurality of sequences of words
from a speech signal
    2.
    发明授权
    Method of and apparatus for deriving a plurality of sequences of words from a speech signal 失效
    从语音信号中导出多个词序列的方法和装置

    公开(公告)号:US5987409A

    公开(公告)日:1999-11-16

    申请号:US938922

    申请日:1997-09-26

    IPC分类号: G10L15/08 G10L15/18 G10L7/08

    CPC分类号: G10L15/18 G10L15/08

    摘要: The determination of a plurality of sequences of words from a speech signal with a decreasing probability of correspondence utilizes the best word sequence as a basis and as further word sequences there are determined only those which enclose a part of the best word sequence, that is to say the remainder of these word sequences. To this end, the recognition involves first the formation of a word graph and the best word sequence is separately stored as a tree which initially has one branch only. The word boundaries of this word sequence form nodes in this tree. Because only nodes of this tree have to be taken into account for the next-best word sequences, the calculation is substantially simpler than if the complete word graph were first completely expanded in the form of a tree and completely searched again for each new word sequence.

    摘要翻译: 从具有降低的对应概率的语音信号中确定多个单词序列利用最佳单词序列作为基础,并且作为另外的单词序列,仅确定包含最佳单词序列的一部分的单词序列,即 说这些单词序列的其余部分。 为此,识别首先要形成一个单词图,最好的单词序列分别存储为最初只有一个分支的树。 该单词序列的单词边界形成该树中的节点。 因为只有这个树的节点必须考虑到下一个最好的单词序列,所以计算比如果完整的单词图形首先以树的形式完全展开并再次完全搜索每个新的单词序列简单得多 。

    Speech recognition system which turns its voice response on for
confirmation when it has been turned off without confirmation
    3.
    发明授权
    Speech recognition system which turns its voice response on for confirmation when it has been turned off without confirmation 失效
    语音识别系统,当语音识别系统在没有确认的情况下关闭时,将其语音响应转为确认

    公开(公告)号:US5983179A

    公开(公告)日:1999-11-09

    申请号:US882918

    申请日:1997-06-26

    申请人: Joel M. Gould

    发明人: Joel M. Gould

    摘要: A speech recognition system includes a speech-response capability for responding to sounds which appear to match models of spoken words by performing functions associated with such words. This speech response can be turned on or off. If the system detects both that speech response is off and that no indication is stored that the off state has been confirmed by a user, it performs a confirmation process. This prompts the user to utter a phrase confirming whether or not speech response is to be off; turns speech response on so it can respond to the user's confirmation utterance; determines whether to leave speech response on or off in response to the word which the speech response selects as corresponding to the user's confirmation utterance; and responds to such a determination that speech response is to be turned off, both by turning it off and by storing an indication that the off state has been confirmed by the user. In some embodiments, the system can execute user written programs which can include instructions for turning speech response on or off. In this case, the system can delay the start of a confirmation process until after the execution of such a program, to allow such programs to temporarily turn off speech response without requiring user confirmation. In some embodiments, the user can select whether to deactivate the feature which causes a confirmation processes to occur whenever speech response has been turned off without confirmation.

    摘要翻译: 语音识别系统包括语音响应能力,用于通过执行与这些单词相关联的功能来响应出现与口语单词模型相似的声音。 该语音响应可以打开或关闭。 如果系统检测到语音响应两者都关闭,并且没有指示存储关闭状态已被用户确认,则执行确认过程。 这提示用户发出短语,确认语音响应是否关闭; 转动语音响应,可以响应用户的确认话语; 确定是否响应于语音响应选择的词对应于用户的确认话语而开启或关闭语音响应; 并且通过关闭语音响应并且通过存储用户已经确认关闭状态的指示来响应于要关闭语音响应的确定。 在一些实施例中,系统可以执行用户写入的程序,其可以包括用于打开或关闭语音响应的指令。 在这种情况下,系统可以延迟确认过程的开始直到执行这样的程序,以允许这样的程序暂时关闭语音响应而不需要用户确认。 在一些实施例中,用户可以选择是否停用该特征,这导致每当语音响应已经被关闭而没有确认时就发生确认过程。

    Phoneme dividing method using multilevel neural network
    4.
    发明授权
    Phoneme dividing method using multilevel neural network 失效
    使用多级神经网络的音素分割方法

    公开(公告)号:US5963904A

    公开(公告)日:1999-10-05

    申请号:US746981

    申请日:1996-11-19

    CPC分类号: G10L15/04 G10L25/30

    摘要: A phoneme dividing method using a multilevel neural network applied to a phoneme dividing apparatus having a voice input portion, a preprocessor, a multi-layer perceptron (MLP) phoneme dividing portion, and a phoneme border outputting portion includes the steps of: (a) sequentially segmenting and framing voice with digitalized voice samples, extracting characteristic vectors by vocal frames, and extracting an inter-frame characteristic vector of the difference between nearby frames of the characteristic vectors by frames, to thereby normalize the maximum and minimum of the characteristics; (b) storing information on the weight obtained through learning and the standard of the MLP; and (c) reading the weight obtained in the step (b), receiving the characteristic vectors, performing an operation of phoneme border discrimination to generate an output value, discriminating the phoneme border according to the output value, and if the current analyzed frame arrives two frames preceding the final frame of incoming voice, outputting a frame number indicative of the border of phoneme as a final result.

    摘要翻译: 应用于具有语音输入部分,预处理器,多层感知器(MLP)音素分割部分和音素边界输出部分的音素分割装置的多级神经网络的音素分割方法包括以下步骤:(a) 用数字化语音样本顺序分割和构成语音,通过声乐帧提取特征向量,并通过帧提取特征向量的附近帧之间的差异的帧间特征向量,从而归一化特征的最大和最小值; (b)存储通过学习获得的重量和MLP标准的信息; 以及(c)读取在步骤(b)中获得的权重,接收特征矢量,执行音素边界识别的操作以产生输出值,根据输出值识别音素边界,以及当前分析帧到达时 在进入语音的最后帧之前的两帧,输出表示音素边界的帧号作为最终结果。

    Automatic speech recognition
    5.
    发明授权
    Automatic speech recognition 失效
    自动语音识别

    公开(公告)号:US5905971A

    公开(公告)日:1999-05-18

    申请号:US709685

    申请日:1996-09-10

    摘要: Speech recognition is carried out by matching parameterized speech with a dynamically extended network of paths comprising model linguistic elements (12b, 12c). The units are context related, e.g. triphones. Some elements cannot be converted to models at the time when it is necessary to incorporate the element into the paths because the context is not defined at the relevant time. In order to allow transfer the element is transferred as a place marker (21, 22, 23 24) which is converted when a later extension completes the definition of the triphone. The place markers (12a) can be used to identify the locations for subsequent extensions.

    摘要翻译: 通过将参数化语音与包括模型语言元素(12b,12c)的动态扩展的路径网络相匹配来进行语音识别。 单位是上下文相关的,例如。 三通电话 当需要将元素合并到路径中时,某些元素不能转换为模型,因为在相关时间没有定义上下文。 为了允许传送,将元素作为地点标记(21,22,23,24)传送,当后面的扩展完成三音节的定义时,它被转换。 位置标记(12a)可用于识别后续扩展的位置。

    Telecommunications instrument employing variable criteria speech
recognition
    7.
    发明授权
    Telecommunications instrument employing variable criteria speech recognition 失效
    采用可变标准语音识别的电信仪器

    公开(公告)号:US5842161A

    公开(公告)日:1998-11-24

    申请号:US668660

    申请日:1996-06-25

    IPC分类号: G10L15/10 G10L15/22 G10L7/08

    CPC分类号: G10L15/10 G10L2015/0631

    摘要: A recognition criterion or set of recognition criteria are updated automatically, over time, in accordance with the speech input of the user(s). Each input utterance is compared to one or more models of speech to determine a similarity metric for each such comparison. A model of speech which most closely matches the utterance is determined based on the one or more similarity metrics. The similarity metric corresponding to the most closely matching model of speech is analyzed to determine whether the similarity metric satisfies the selected set of recognition criteria. The recognition criteria are automatically altered during use or "on-the-fly", so that more appropriate criteria (and associated thresholds) may be used to either increase the probability of recognition or decrease the incidence of false positive results. Illustratively, if a voice sample results in a near miss of a template, a more liberal criterion is thereafter employed to increase the probability of recognition for subsequent input. Parametric histories of recognition and near misses followed by recognition are kept with periodic alteration of the criteria values to correspond to these histories. Additionally, parametric histories of false alarms are maintained and used to update criteria values in combination with recognition histories.

    摘要翻译: 根据用户的语音输入,随着时间的推移自动更新识别标准或识别标准集合。 将每个输入语音与一个或多个语音模型进行比较,以确定每个这样的比较的相似性度量。 基于一个或多个相似性度量来确定与语音最接近匹配的语音模型。 分析与最接近匹配的语音模型对应的相似性度量,以确定相似性度量是否满足所选择的一组识别标准。 识别标准在使用或“即时”期间自动更改,以便可以使用更合适的标准(和相关阈值)来增加识别的可能性或降低假阳性结果的发生率。 说明性地,如果语音样本导致模板的接近错过,则采用更自由的标准来增加后续输入的识别概率。 识别和接近遗漏的参数历史随后被识别,保持标准值的周期性改变以对应于这些历史。 此外,维护虚警的参数历史,并用于与识别历史相结合来更新标准值。

    Devices and methods for speech recognition of vocabulary words with
simultaneous detection and verification
    8.
    发明授权
    Devices and methods for speech recognition of vocabulary words with simultaneous detection and verification 失效
    用于同时检测和验证的词汇单词语音识别的装置和方法

    公开(公告)号:US5832430A

    公开(公告)日:1998-11-03

    申请号:US569471

    申请日:1995-12-08

    IPC分类号: G10L15/06 G10L15/14 G10L7/08

    CPC分类号: G10L15/142 G10L15/07

    摘要: Devices and methods for speech recognition enable simultaneous word hypothesis detection and verification in a one-pass procedure that provides for different segmentations of the speech input. A confidence measure of a target hypothesis for a known word is determined according to a recursion formula that operates on parameters of a target models and alternate models of known words, a language model and a lexicon, and feature vectors of the speech input in a likelihood ratio decoder. The confidence measure is processed to determine an accept/reject signal for the target hypothesis that is output with a target hypothesis signal. The recursion formula is based on hidden Markov models with a single optimum state sequence and may take the form of a modified Viterbi algorithm.

    摘要翻译: 用于语音识别的装置和方法使得能够在提供语音输入的不同分段的单程程序中同时进行字假设检测和验证。 根据对目标模型和已知单词的替代模型,语言模型和词典的参数以及可能性的语音输入的特征向量进行操作的递归公式来确定已知单词的目标假设的可信度度量 比率解码器。 处理置信度量以确定用目标假设信号输出的目标假设的接受/拒绝信号。 递归公式基于具有单个最优状态序列的隐马尔可夫模型,并且可以采用经修改的维特比算法的形式。

    Speech recognition method with error reset commands
    9.
    发明授权
    Speech recognition method with error reset commands 失效
    具有错误复位命令的语音识别方法

    公开(公告)号:US5781887A

    公开(公告)日:1998-07-14

    申请号:US728012

    申请日:1996-10-09

    申请人: Biing-Hwang Juang

    发明人: Biing-Hwang Juang

    IPC分类号: G10L15/22 G10L7/08 G10L9/00

    CPC分类号: G10L15/22

    摘要: A method for revising at least a portion of a sequence of speech data segments recognized by an automated speech recognition system. A user is prompted to vocalize the speech data segments sequentially, one speech data segment at a time. When each speech data segment is recognized it is stored as a data element and a confirmation of recognition is issued to the user. The user may then issue a verbal command to delete the last recognized data element if the confirmation indicates that a recognition error has occurred, and then repeat the last speech data element for a second recognition attempt. The user may also issue another verbal command to delete all thus-far recognized data elements in the sequence and to restart the recognition process from the beginning. If no such verbal commands are issued by the user, then the user may continue to vocalize the next sequential speech data segment.

    摘要翻译: 一种用于修改由自动语音识别系统识别的语音数据段序列的至少一部分的方法。 提示用户顺序发音语音数据段,一次一个语音数据段。 当每个语音数据段被识别时,其被存储为数据元素,并且向用户发出识别确认。 然后,如果确认指示已经发生识别错误,则用户可以发出口令命令来删除最后识别的数据元素,然后重复最后一个语音数据元素进行第二次识别尝试。 用户还可以发出另一个口头命令来删除序列中的所有这样被识别的数据元素,并从头开始重新启动识别过程。 如果用户没有发出这样的口头命令,则用户可以继续发出下一个顺序语音数据段的发声。