Design and construction of a binary-tree system for language modelling
    1.
    发明授权
    Design and construction of a binary-tree system for language modelling 失效
    用于语言建模的二叉树系统的设计和构造

    公开(公告)号:US4852173A

    公开(公告)日:1989-07-25

    申请号:US114892

    申请日:1987-10-29

    摘要: In order to determine a next event based upon available data, a binary decision tree is constructed having true or false questions at each node and a probability distribution of the unknown next event based upon available data at each leaf. Starting at the root of the tree, the construction process proceeds from node-to-node towards a leaf by answering the question at each node encountered and following either the true or false path depending upon the answer. The questions are phrased in terms of the available data and are designed to provide as much information as possible about the next unknown event. The process is particularly useful in speech recognition when the next word to be spoken is determined on the basis of the previously spoken words.

    摘要翻译: 为了基于可用数据确定下一个事件,构建在每个节点处具有真或假问题的二进制决策树,以及基于每个叶片处的可用数据的未知下一事件的概率分布。 从树的根开始,构建过程通过回答所遇到的每个节点的问题,并根据答案遵循真实或错误的路径,从节点到节点进行到叶。 这些问题是根据可用数据编写的,旨在为下一个未知事件提供尽可能多的信息。 当基于先前说出的单词确定要说出的下一个单词时,该过程特别有用。

    Method and apparatus for the automatic determination of phonological
rules as for a continuous speech recognition system
    2.
    发明授权
    Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system 失效
    用于连续语音识别系统自动确定语音规则的方法和装置

    公开(公告)号:US5033087A

    公开(公告)日:1991-07-16

    申请号:US323479

    申请日:1989-03-14

    IPC分类号: G10L11/00 G10L15/06 G10L15/14

    CPC分类号: G10L15/14

    摘要: A continuous speech recognition system includes an automatic phonological rules generator which determines variations in the pronunciation of phonemes based on the context in which they occur. This phonological rules generator associates sequences of labels derived from vocalizations of a training text with respective phonemes inferred from the training text. These sequences are then annotated with their pheneme context from the training text and clustered into groups representing similar pronunciations of each phoneme. A decision tree is generated using the context information of the sequences to predict the clusters to which the sequences belong. The training data is processed by the decision tree to divide the sequences into leaf-groups representing similar pronunciations of each phoneme. The sequences in each leaf-group are clustered into sub-groups representing respectively different pronunciations of their corresponding phoneme in a give context. A Markov model is generated for each sub-group. The various Markov models of a leaf-group are combined into a single compound model by assigning common initial and final states to each model. The compound Markov models are used by a speech recognition system to analyze an unknown sequence of labels given its context.

    Training of markov models used in a speech recognition system
    3.
    发明授权
    Training of markov models used in a speech recognition system 失效
    在语音识别系统中使用的马尔科夫模型的训练

    公开(公告)号:US4827521A

    公开(公告)日:1989-05-02

    申请号:US845201

    申请日:1986-03-27

    IPC分类号: G10L11/00 G10L15/14 G10L5/00

    CPC分类号: G10L15/14

    摘要: In a word, or speech, recognition system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a value is stored for each of at least some probability items, the present invention relates to apparatus and method for determining probability values for probability items by biassing at least some of the stored values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word. Specifically, the current values of counts --from which probability items are derived--are adjusted by uttering a known word and determining how often probability events occur relative to (a) the model corresponding to the known uttered "correct" word and (b) the model of at least one other "incorrect" word. The current count values are increased based on the event occurrences relating to the correct word and are reduced based on the event occurrences relating to the incorrect word or words.

    Feneme-based Markov models for words
    4.
    发明授权
    Feneme-based Markov models for words 失效
    基于Feneme的马尔可夫模型的词

    公开(公告)号:US5165007A

    公开(公告)日:1992-11-17

    申请号:US366231

    申请日:1989-06-12

    IPC分类号: G10L15/02 G10L15/06 G10L15/14

    CPC分类号: G10L15/142 G10L2015/0631

    摘要: In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability of the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.

    摘要翻译: 在一种语音识别系统中,公开了用基于标签的马尔可夫模型对词进行建模的装置和方法。 所述建模包括:将对应于词汇表中的单词的第一语音输入输入到将每个口语单词转换成标准标签序列的声学处理器,其中每个标准标签对应于可分配到时间间隔的声音类型; 将每个标准标签表示为具有多个状态的概率模型,至少一个从状态到状态的转变,以及在某些转换时的至少一个可设置的输出概率; 将选定的声音输入输入到将所选择的声音输入转换成个性化标签的声学处理器,每个个性化标签对应于分配给一段时间的声音类型; 并将每个输出概率设置为由给定模型表示的标准标签的概率,该给定模型在给定模型中的给定转换处产生特定个性化标签。 本发明解决了在语音识别系统中简单和自动地生成单词模型的问题。

    Speech recognition employing a set of Markov models that includes Markov
models representing transitions to and from silence
    5.
    发明授权
    Speech recognition employing a set of Markov models that includes Markov models representing transitions to and from silence 失效
    语音识别采用一组马尔可夫模型,其中包括表示从沉默转换到沉默的马尔可夫模型

    公开(公告)号:US4977599A

    公开(公告)日:1990-12-11

    申请号:US289447

    申请日:1988-12-15

    IPC分类号: G10L15/02 G10L15/06 G10L15/14

    摘要: Apparatus and method for constructing word baseforms which can be matched against a string of generated acoustic labels. A set of phonetic phone machines are formed, wherein each phone machine has (i) a plurality of states, (ii) a plurality of transitions each of which extends from a state to a state, (iii) a stored probability for each transition, and (iv) stored label output probabilities, each label output probability corresponding to the probability of each phone machine producing a corresponding label. The set of phonetic machines is formed to include a subset of onset phone machines. The stored probabilities of each onset phone macine correspond to at least one phonetic element being uttered at the beginning of a speech segment. The set of phonetic machines is formed to include a subset of trailing phone machines. The stored probabilities of each trailing phone machine correspond to at least one single phonetic element being uttered at the end of a speech segment. Word baseforms are constructed by concatenating phone machines selected from the set.

    摘要翻译: 用于构建可与一串生成的声学标签匹配的字基形式的装置和方法。 形成一组语音电话机,其中每个电话机具有(i)多个状态,(ii)多个转换,每个转换从状态延伸到状态,(iii)每个转换的存储概率, 和(iv)存储的标签输出概率,每个标签输出概率对应于每个电话机产生相应标签的概率。 语音机的组合形成为包括起动电话机的一个子集。 每个起始电话机的存储概率对应于在语音段开始时发出的至少一个语音元素。 该组语音机器被形成为包括拖尾电话机的子集。 每个拖尾电话机的存储概率对应于在语音段结束时发出的至少一个单个语音元素。 字基础是通过连接从集合中选择的电话机构成的。

    Synthesizing word baseforms used in speech recognition
    6.
    发明授权
    Synthesizing word baseforms used in speech recognition 失效
    合成语言识别中使用的词基形式

    公开(公告)号:US4882759A

    公开(公告)日:1989-11-21

    申请号:US853525

    申请日:1986-04-18

    IPC分类号: G10L11/00 G10L15/06 G10L15/14

    CPC分类号: G10L15/14

    摘要: Apparatus and method for synthesizing word baseforms for words not spoken during a training session, wherein each synthesized baseform represents a series of models from a first set of models, which include: (a) uttering speech during a training session and representing the uttered speech as a sequence of models from a second set of models; (b) for each of at least some of the second set models spoken in a given phonetic model context during the training session, storing a respective string of first set models; and (c) constructing a word baseform of first set models for a word not spoken during the training session, including the step of representing each piece of a word that corresponds to a second set model in a given context by the stored respective string, if any, corresponding thereto.

    摘要翻译: 用于合成在训练期间未被说出的词语的词基形式的装置和方法,其中每个合成基形式表示来自第一组模型的一系列模型,其包括:(a)在训练期间发出语音并将发出的语音表示为 来自第二组模型的一系列模型; (b)对于训练期间在给定语音模型上下文中说出的至少一些第二组模型中的每一个,存储相应的第一组模型串; 以及(c)在训练会话期间为未被说出的单词构造第一组模型的单词基本形式,包括在给定上下文中通过存储的相应字符串表示对应于第二组模型的单词的每一段的步骤,如果 任何相应的。

    Automatic generation of simple Markov model stunted baseforms for words
in a vocabulary
    7.
    发明授权
    Automatic generation of simple Markov model stunted baseforms for words in a vocabulary 失效
    自动生成简单的马尔科夫模型,使词汇中的词语发生阻塞

    公开(公告)号:US4833712A

    公开(公告)日:1989-05-23

    申请号:US738934

    申请日:1985-05-29

    IPC分类号: G10L15/06 G10L15/14

    CPC分类号: G10L15/14 G10L15/063

    摘要: In a system that (i) defines each word in a vocabulary by a fenemic baseform of fenemic phones, (ii) defines an alphabet of composite phones each of which corresponds to at least one fenemic phone, and (iii) generates a string of fenemes in response to speech input, the method provides for converting a word baseform comprised of fenemic phones into a stunted word baseform of composite phones by (a) replacing each fenemic phone in the fenemic phone word baseform by the composite phone corresponding thereto; and (b) merging together at least one pair of adjacent composite phones by a single composite phone where the adverse effect of the merging is below a predefined threshold.

    摘要翻译: 在一个系统中,(i)通过无线电手机的基本形式来定义词汇表中的每个单词,(ii)定义了复合电话的字母表,每个单词对应于至少一个美式手机,以及(iii)生成一串拼音 响应于语音输入,该方法通过以下方式提供:将由短信电话组成的单词基础形式转换成复合电话的发音失真的基本形式:(a)通过与之相对应的复合电话替换所述手机基本形式中的每一个无线电话; 和(b)通过单个复合电话将至少一对相邻复合电话合并在一起,其中合并的不利影响低于预定阈值。

    Determination of phone weights for markov models in a speech recognition
system
    9.
    发明授权
    Determination of phone weights for markov models in a speech recognition system 失效
    确定语音识别系统中马尔科夫模型的手机权重

    公开(公告)号:US4741036A

    公开(公告)日:1988-04-26

    申请号:US696976

    申请日:1985-01-31

    CPC分类号: G10L15/144

    摘要: In a speech recognition system, discrimination between similar-sounding uttered words is improved by weighting the probability vector data stored for the Markov model representing the reference word sequence of phones. The weighting vector is derived for each reference word by comparing similar sounding utterances using Viterbi alignment and multivariate analysis which maximizes the differences between correct and incorrect recognition multivariate distributions.

    摘要翻译: 在语音识别系统中,通过对表示电话的参考字序列的马尔可夫模型存储的概率向量数据进行加权来改进类似声音发音字之间的区别。 通过使用维特比对齐和多变量分析比较类似的声音语音,为每个参考词导出权重向量,从而最大化正确和不正确的识别多变量分布之间的差异。

    Speech recognition system
    10.
    发明授权
    Speech recognition system 失效
    语音识别系统

    公开(公告)号:US4718094A

    公开(公告)日:1988-01-05

    申请号:US845155

    申请日:1986-03-27

    摘要: Speech words are recognized by first recognizing each spectral vector identified by a label (feneme), then identifying the word by matching the string of labels against phones using simplified phone machines based on label and transition probabilities and Merkov chains. In one embodiment, a detailed acoustic match word score is combined with an approximate acoustic match word score to provide a total word score for a subject word. In another embodiment, a polling word score is combined with an acoustic match word score to provide a total word score for a subject word. The acoustic models employed in the acoustic matching may correspond, alternatively, to phonetic elements or to fenemes. Fenemes represent labels generated by an acoustic processor in response to a spoken input. Apparatus and method for determining word scores according to approximate acoustic matching and for determining word scores according to a polling methodology are disclosed.

    摘要翻译: 通过首先识别由标签(feneme)标识的每个频谱矢量,然后通过基于标签和转换概率以及Merkov链使用简化的电话机将标签串与电话匹配来识别词语来识别语音词。 在一个实施例中,将详细的声匹配词得分与近似声匹配词得分组合以提供主题词的总词分数。 在另一个实施例中,轮询词得分与声匹配词得分组合以提供主题词的总词分数。 在声学匹配中使用的声学模型可以对应于语音元件或拼音。 Fenemes表示响应于语音输入由声学处理器产生的标签。 公开了根据近似声匹配确定单词分数并根据轮询方法确定单词分数的装置和方法。