Method for representing word models for use in speech recognition
    1.
    发明授权
    Method for representing word models for use in speech recognition 失效
    用于表示用于语音识别的单词模型的方法

    公开(公告)号:US4903305A

    公开(公告)日:1990-02-20

    申请号:US328738

    申请日:1989-03-23

    IPC分类号: G10L15/06 G10L15/14

    摘要: A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.

    摘要翻译: 提供了一种用于导出用于语音识别的声学词表示的方法。 创建初始词模型,每个模型由一系列声学子模型组成。 来自多个单词模型的声学子模型被聚类,以便使用例如Kullback-Leibler信息作为相似度的度量来将来自不同单词的声学上相似的子模型分组。 然后,每个单词都是用聚类拼写表示的,表示聚类中其声学子模型放置的聚类。 通过将要识别的来自语音的帧的序列与与单个词模型的群集拼写的群集相关联的声学模型的序列进行比较来执行语音识别。 本发明还提供了一种用于导出单词表示的方法,该方法涉及用于接收单词的第一组帧序列,使用动态规划来导出独立于任何先前导出的任何声学模型特定的单词的概率声学子模型的对应的初始序列 使用动态规划来将该单词的第二组帧序列中的每一个时间对齐到与模型的初始序列相对应的一系列新子序列中,并且使用这些新的子序列来计算新的概率子序列, 楷模。

    System for processing a succession of utterances spoken in continuous or
discrete form
    2.
    发明授权
    System for processing a succession of utterances spoken in continuous or discrete form 失效
    用于处理以连续或离散形式发出的一连串话语的系统

    公开(公告)号:US5526463A

    公开(公告)日:1996-06-11

    申请号:US45991

    申请日:1993-04-09

    CPC分类号: G10L15/10

    摘要: The system of the invention relates to continuous speech pre-filtering systems for use in discrete and continuous speech recognition computer systems. The speech to be recognized is converted from utterances to frame data sets, which frame data sets are smoothed to generate a smooth frame model over a predetermined number of frames. A resident vocabulary is stored within the computer as clusters of word models which are acoustically similar over a succession of frame periods. A cluster score is generated by the system, which score includes the likelihood of the smooth frames evaluated using a probability model for the cluster against which the smooth frame model is being compared. Cluster sets having cluster scores below a predetermined acoustic threshold are removed from further consideration. The remaining cluster sets are unpacked for determination of a word score for each unpacked word. These word scores are used to identify those words which are above a second predetermined threshold to define a word list which is sent to a recognizer for a more lengthy word match. Control means enable the system to initialize times corresponding to the frame start time for each frame data set, defining a sliding window.

    摘要翻译: 本发明的系统涉及用于离散和连续语音识别计算机系统的连续语音预滤波系统。 要识别的语音从语音转换为帧数据集,该帧数据集被平滑以在预定数量的帧上生成平滑帧模型。 驻留词汇被存储在计算机内,作为在一系列帧周期中在声学上相似的单词模型的群集。 由系统产生聚类分数,该分数包括使用针对平滑帧模型进行比较的群集的概率模型评估的平滑帧的可能性。 具有低于预定声学阈值的聚类分数的聚类集从进一步的考虑中被去除。 剩下的集群集被解包以确定每个未打包的单词的单词得分。 这些单词分数用于识别高于第二预定阈值的那些单词以定义一个单词列表,该单词列表被发送到识别器以获得更长的单词匹配。 控制装置使得系统可以初始化与每个帧数据组的帧开始时间相对应的时间,从而定义滑动窗口。

    Method for speech analysis and speech recognition
    3.
    发明授权
    Method for speech analysis and speech recognition 失效
    语音分析和语音识别方法

    公开(公告)号:US4805218A

    公开(公告)日:1989-02-14

    申请号:US34842

    申请日:1987-04-03

    IPC分类号: G10L15/00 G10L1/00

    CPC分类号: G10L15/00

    摘要: A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.

    摘要翻译: 语音分析的方法针对每个声学帧序列计算一个或多个差分参数,其中每个差分参数是一个帧中的声学参数与附近帧中的声学参数之间的差的函数。 该方法用于语音识别,其将每个帧的差分参数与表示语音单元的声学模型进行比较,其中每个语音单元模型具有与其语音单元的帧相关联的差异参数的模型。 差分参数可以是斜率参数或能量差参数。 通过找到给定帧的给定频谱参数的能量与附近帧中与不同频带相关联的频谱参数的能量之间的差异来导出斜率参数。 所得到的参数表示由给定参数表示的频谱部分中的能量频率正在上升或下降的程度。 根据一帧中给定的频谱参数与表示相同频段的附近帧中的频谱参数之间的差值,计算能量差参数。 在本发明的一个实施例中,动态规划将要识别的帧序列的差参数与与多个语音单元模型中的每一个相关联的动态编程元件的序列进行比较。 在本发明的另一实施例中,每个语音单元模型表示一个音素,并且将多个音素的语音单元模型与各个帧进行比较,以将每个这样的帧与每个这样的帧相关联,其一个或多个音素的模型与 它。

    Speech recognition apparatus and method
    4.
    发明授权
    Speech recognition apparatus and method 失效
    语音识别装置及方法

    公开(公告)号:US4783803A

    公开(公告)日:1988-11-08

    申请号:US797249

    申请日:1985-11-12

    IPC分类号: G10L15/00 G10L1/00

    CPC分类号: G10L15/00

    摘要: A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system's best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.

    摘要翻译: 公开了一种系统,用于在先前识别的一个或多个其他模式的上下文的情况下识别数据集合中的模式。 优选地,该系统是语音识别系统,该模式是单词,并且数据的收集是一系列声学帧。 在处理多个帧中的每一个帧期间,对于活跃词汇表中的每个单词,系统更新表示单词和框架之间的匹配概率的似然度分数,将基于一个或多个先前识别的语言模型得分组合起来 具有该可能性分数的单词,并且如果组合分数低于阈值,则从活动词汇表中修剪单词。 在帧和初始词汇的每个单词之间进行快速匹配,以确定最初应该在活动词汇表中放置哪些单词。 优选地,该系统使得操作者能够仅通过说另一个词来确认系统对于口语的最佳猜测,以通过键入与该猜测相关联的键来指示系统的替代猜测是正确的,并且指示最佳 通过键入另一个键来猜测也不会有其他猜测是正确的。 该系统包括其他特征,包括用于确定帧之间寻找语音开始的特征,以及用于计算可能性分数的特殊硬件处理器。

    Process of synthesizing mixed BaO-TiO.sub.2 based powders for ceramic
applications
    5.
    发明授权
    Process of synthesizing mixed BaO-TiO.sub.2 based powders for ceramic applications 失效
    用于陶瓷应用的合成BaO-TiO2基粉末的方法

    公开(公告)号:US4606906A

    公开(公告)日:1986-08-19

    申请号:US671539

    申请日:1984-11-15

    摘要: A process for producing any desired Ba/Ti mixture to be formulated as an amorphous solid which crystallizes at very low temperatures to yield a desired phase or phases is disclosed. The process yields products free of undesirable impurities and allows macroscopic production of certain phases in the baria-titania system, having exceptional high frequency dielectric properties, that were previously unattainable through solid-state high temperature production techniques.

    摘要翻译: 公开了一种制备待配制成非晶态固体的任何所需Ba / Ti混合物的方法,该非晶态固体在非常低的温度下结晶以产生所需的相或相。 该方法产生的产品不含不需要的杂质,并允许宏观生产在奥比亚 - 二氧化钛系统中的某些相,具有出众的高频介电特性,以前是通过固态高温生产技术无法实现的。

    Method for creating and using multiple-word sound models in speech
recognition
    6.
    发明授权
    Method for creating and using multiple-word sound models in speech recognition 失效
    在语音识别中创建和使用多个字的声音模型的方法

    公开(公告)号:US4837831A

    公开(公告)日:1989-06-06

    申请号:US919885

    申请日:1986-10-15

    IPC分类号: G10L15/06

    CPC分类号: G10L15/063

    摘要: A first speech recognition method receives an acoustic description of an utterance to be recognized and scores a portion of that description against each of a plurality of cluster models representing similar sounds from different words. The resulting score for each cluster is used to calculate a word score for each word represented by that cluster. Preferably these word scores are used to prefilter vocabulary words, and the description of the utterance includes a succession of acoustic decriptions which are compared by linear time alignment against a succession of acoustic models. A second speech recognition method is also provided which matches an acoustic model with each of a succession of acoustic descriptions of an utterance to be recognized. Each of these models has a probability score for each vocabulary word. The probability scores for each word associated with the matching acoustic models are combined to form a total score for that word. The preferred speech recognition method calculates to separate word scores for each currently active vocabulary word from a common succession of sounds. Preferably the first scores is calculated by a time alignment method, while the second score is calculated by a time independent method. Preferably this calculation of two separate word scores is used in one of multiple word-selecting phase of a recognition process, such as in the prefiltering phase.

    摘要翻译: 第一语音识别方法接收要识别的话语的声学描述,并且针对表示来自不同单词的类似声音的多个群集模型中的每一个分类该描述的一部分。 每个群集的结果得分用于计算由该群组表示的每个单词的单词得分。 优选地,这些单词分数用于预先滤除词汇单词,并且话语的描述包括通过线性时间对齐与一系列声学模型进行比较的一系列声学评论。 还提供了第二语音识别方法,其将声学模型与要被识别的话语的一系列声学描述中的每一个相匹配。 这些模型中的每一个都具有每个词汇单词的概率分数。 将与匹配的声学模型相关联的每个单词的概率分数组合以形成该单词的总分。 优选的语音识别方法计算用于将每个当前活跃的词汇单词的词分数从普通连续的声音中分离出来。 优选地,通过时间对准方法计算第一分数,而通过时间独立方法计算第二分数。 优选地,两个单独的单词分数的计算用于识别过程的多个字选择阶段之一,例如在预过滤阶段。

    Large-vocabulary continuous speech prefiltering and processing system
    7.
    发明授权
    Large-vocabulary continuous speech prefiltering and processing system 失效
    大型音频连续语音预处理和处理系统

    公开(公告)号:US5202952A

    公开(公告)日:1993-04-13

    申请号:US542520

    申请日:1990-06-22

    IPC分类号: G10L15/08 G10L15/10 G10L15/28

    CPC分类号: G10L15/10

    摘要: A continuous speech prefiltering system for use in continuous speech recognition computer systems. The speech to be recognized is converted from utterances to frame data sets, which frame data sets are smoothed to generate a smooth frame model over a predetermined number of frames. A resident vocabulary is stored within the computer as clusters of word models which are acoustically similar over a succession of frame periods. A cluster score is generated by the system, which score includes the likelihood of the smooth frames evaluated using a probability model for the cluster against which the smooth frame model is being compared. Cluster sets having cluster scores below a predetermined acoustic threshold are removed from further consideration. The remaining cluster sets are unpacked for determination of a word score for each unpacked word. These word scores are used to identify those words which are above a second predetermined threshold to define a word list which is sent to a recognizer for a more lengthy word match. A controller enables the system to initialize times corresponding to the frame start time for each frame data set, defining a sliding window.

    摘要翻译: 一种用于连续语音识别计算机系统的连续语音预过滤系统。 要识别的语音从语音转换为帧数据集,该帧数据集被平滑以在预定数量的帧上生成平滑帧模型。 驻留词汇被存储在计算机内,作为在一系列帧周期中在声学上相似的单词模型的群集。 由系统产生聚类分数,该分数包括使用针对平滑帧模型进行比较的群集的概率模型评估的平滑帧的可能性。 具有低于预定声学阈值的聚类分数的聚类集从进一步的考虑中被去除。 剩下的集群集被解包以确定每个未打包的单词的单词得分。 这些单词分数用于识别高于第二预定阈值的那些单词以定义一个单词列表,该单词列表被发送到识别器以获得更长的单词匹配。 控制器使得系统可以初始化与帧开始时间对应的时间,从而定义滑动窗口。