Specific task composite acoustic models
    1.
    发明授权
    Specific task composite acoustic models 有权
    具体任务复合声学模型

    公开(公告)号:US06260014B1

    公开(公告)日:2001-07-10

    申请号:US09153222

    申请日:1998-09-14

    IPC分类号: G10L1504

    摘要: A method for recognizing speech includes the steps of providing a generic model having a baseform representation of a vocabulary of words, identifying a subset of words relating to an application, constructing a task specific model for the subset of words, constructing a composite model by combining the generic and task specific models and modifying the baseform representation of the subset of words such that the subset of words are recognized by the task specific model. A system for recognizing speech includes a composite model having a generic model having a generic baseform representation of a vocabulary of words and a task specific model for recognizing a subset of words relating to an application wherein the subset of words are recognized using a modified baseform representation. A recognizer compares words input thereto with the generic model for words other than the subset of words and with the task specific model for the subset of words.

    摘要翻译: 一种用于识别语音的方法包括以下步骤:提供具有词汇词典的基本形式表示的通用模型,识别与应用有关的单词的子集,为所述单词子集构建任务特定模型,通过组合来构建复合模型 通用和任务特定模型,并修改单词子集的基本形式表示,使得单词的子集由任务特定模型识别。 用于识别语音的系统包括具有通用模型的复合模型,所述通用模型具有词汇词典的通用基本形式表示,以及用于识别与应用有关的词组的任务特定模型,其中使用经修改的基本形式表示来识别单词的子集 。 识别器将输入的词与除单词子集之外的单词的通用模型和词语子集的任务特定模型进行比较。

    State-dependent speaker clustering for speaker adaptation
    2.
    发明授权
    State-dependent speaker clustering for speaker adaptation 失效
    用于说话者适应的状态依赖的扬声器聚类

    公开(公告)号:US5787394A

    公开(公告)日:1998-07-28

    申请号:US572223

    申请日:1995-12-13

    IPC分类号: G10L15/06 G10L5/06

    CPC分类号: G10L15/07 G10L2015/0631

    摘要: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

    摘要翻译: 一种适用于特定用户使用的独立于说话者的语音识别系统的系统和方法。 该系统和方法从测试扬声器收集声学表征数据,并将数据与为多个训练说话者生成的声学特征数据进行比较。 在特定声学子空间的测试扬声器的声学特性与相同声学子空间的每个训练说话者的声学特性之间计算匹配分数。 训练演讲者根据其分数对子空间进行排名,并且基于测试讲者的声学表征数据和最接近的匹配训练说话者的声学表征数据为测试说话者生成新的声学模型。 对于每个声学子空间重复该过程。

    System and method for partitioning the feature space of a classifier in
a pattern classification system

    公开(公告)号:US6058205A

    公开(公告)日:2000-05-02

    申请号:US781574

    申请日:1997-01-09

    IPC分类号: G06K9/62 G06F17/20

    CPC分类号: G06K9/6282

    摘要: A system and method are provided which partition the feature space of a classifier by using hyperplanes to construct a binary decision tree or hierarchical data structure for obtaining the class probabilities for a particular feature vector. One objective in the construction of the decision tree is to minimize the average entropy of the empirical class distributions at each successive node or subset, such that the average entropy of the class distributions at the terminal nodes is minimized. First, a linear discriminant vector is computed that maximally separates the classes at any particular node. A threshold is then chosen that can be applied on the value of the projection onto the hyperplane such that all feature vectors that have a projection onto the hyperplane that is less than the threshold are assigned to a child node (say, left child node) and the feature vectors that have a projection greater than or equal to the threshold are assigned to a right child node. The above two steps are then repeated for each child node until the data at a node falls below a predetermined threshold and the node is classified as a terminal node (leaf of the decision tree). After all non-terminal nodes have been processed, the final step is to store a class distribution associated with each terminal node. The class probabilities for a particular feature vector can then be obtained by traversing the decision tree in a top-down fashion until a terminal node is identified which corresponds to the particular feature vector. The information provided by the decision tree is that, in computing the class probabilities for the particular feature vector, only the small number of classes associated with that particular terminal node need be considered. Alternatively, the required class probabilities can be obtained simply by taking the stored distribution of the terminal node associated with the particular feature vector.

    Apparatus and method for performing model estimation utilizing a
discriminant measure
    4.
    发明授权
    Apparatus and method for performing model estimation utilizing a discriminant measure 失效
    使用判别式进行模型估计的装置和方法

    公开(公告)号:US5970239A

    公开(公告)日:1999-10-19

    申请号:US908120

    申请日:1997-08-11

    IPC分类号: G06F9/455

    摘要: Method for performing acoustic model estimation to optimize classification accuracy on speaker derived feature vectors with respect to a plurality of classes corresponding to phones to which a plurality of acoustic models respectively correspond comprises: (a) initializing an acoustic model for each phone; (b) evaluating the merit of the acoustic model initialized for each phone utilizing an objective function having a two component discriminant measure capable of characterizing each phone whereby a first component is defined as a probability that the model for the phone assigns to feature vectors from the phone and a second component is defined as a probability that the model for the phone assigns to feature vectors from other phones; (c) adapting the model for selected phones so as to increase the first component for the phone or decrease the second component for the phone, the adapting step yielding a new model for each selected phone; (d) evaluating the merit of the new models for each phone adapted in step (c) utilizing the two component measure; (e) comparing results of the evaluation of step (b) with results of the evaluation of step (d) for each phone, and if the first component has increased or the second component has decreased, the new model is kept for that phone, else the model originally initialized is kept; (f) estimating parameters associated with each model kept for each phone in order to optimize the function; and (g) evaluating termination criterion to determine if the parameters of the models are optimized.

    摘要翻译: 用于执行声学模型估计以优化关于与多个声学模型分别对应的电话相对应的多个类别的扬声器导出特征向量的分类精度的方法包括:(a)初始化每个电话的声学模型; (b)使用具有能够表征每个电话的双分量判别式度量的目标函数来评估对于每个电话初始化的声学模型的优点,由此第一分量被定义为电话模型分配来自所述电话的特征向量的概率 电话和第二组件被定义为电话模型从其他电话分配给特征向量的概率; (c)使所选择的手机的模型适配,以便增加电话的第一组件或减少电话的第二组件,适应步骤为每个所选择的电话产生新的模型; (d)利用两部分措施评估在步骤(c)中适应的每个电话的新模型的优点; (e)将步骤(b)的评价结果​​与每个电话的步骤(d)的评估结果进行比较,如果第一组分增加或第二组分减少,则为该电话保留新模型, 否则原始初始化的模型被保留; (f)估计与为每个电话保留的每个模型相关的参数,以优化功能; 和(g)评估终止标准以确定模型的参数是否被优化。

    Apparatus for compression coding using cross-array correlation between
two-dimensional matrices derived from two-valued digital images
    5.
    发明授权
    Apparatus for compression coding using cross-array correlation between two-dimensional matrices derived from two-valued digital images 失效
    使用从二值数字图像导出的二维矩阵之间的交叉阵列相关的压缩编码装置

    公开(公告)号:US4028731A

    公开(公告)日:1977-06-07

    申请号:US617906

    申请日:1975-09-29

    CPC分类号: G06T9/005 G06T9/004 H04N1/417

    摘要: An apparatus is disclosed for compressing a p .times. q image array of two-valued (black/white) sample points. The image array points are serially applied to the apparatus in consecutive raster scan lines. In response, the apparatus simultaneously forms two matrices respectively representing a high order p .times. q predictive error array and a p .times. q array of location events (such as the raster leading edges of all objects in the image). Improved compression is achieved by selecting between the more compression efficient of two methods for encoding the position of errors in the prediction error array. These alternative methods are conventional run-length coding and a novel form of reference encoding, used selectively but to significant advantage. Thus, a run-length compression codeword is formed from the count C of non-errors between consecutive errors (in response to the occurrence of each error in the jth bit position of the ith scan line of the predictive error array) upon either C.ltoreq.T, where T is a threshold, or C>T and there being no occurrence of a line difference encoding for the error (where i, j, C and T have positive integers). A line difference codeword with difference value v is generated upon the joint event of C>T and either the single or multiple occurrence of location events in the ith-1 scan line of the location event array within the bit position range of B.ltoreq.r.ltoreq.(j+n), where positive integer B is the greater of function D(T,v) and (j-n), and the number of intervening location events, s, within the bit position range of D(T,v).ltoreq.q

    摘要翻译: 公开了用于压缩二值(黑/白)采样点的p×q图像阵列的装置。 图像阵列点在连续的光栅扫描线中串行地应用于设备。 作为响应,装置同时形成分别表示高阶p×q预测误差阵列和位置事件的p×q阵列(诸如图像中的所有对象的光栅前沿)的两个矩阵。 通过在预测误差阵列中编码错误位置的两种方法的更高的压缩效率之间进行选择来实现改进的压缩。 这些替代方法是常规的游程长度编码和一种新颖的参考编码形式,其选择性使用,但具有显着的优点。 因此,从C连续错误之间的非错误的计数C(响应于预测误差阵列的第i个扫描线的第j位位置中的每个错误的出现)而形成游程长度压缩码字, / = T,其中T是阈值,或C> T,并且不存在用于错误的行差编码(其中i,j,C和T具有正整数)。 在C> T的联合事件处产生具有差值v的线差码字,并且在位置事件阵列的位置事件阵列的位置事件阵列的单个或多个位置事件中的单个或多个发生位置位置范围B < 其中正整数B是函数D(T,v)和(jn)中的较大者,D(T,V)和(jn)的位位置范围内的中间位置事件数s v)

    Method for modeling and recognizing speech including word liaisons
    6.
    发明授权
    Method for modeling and recognizing speech including word liaisons 有权
    用于建模和识别语音的方法,包括字词联络

    公开(公告)号:US5995931A

    公开(公告)日:1999-11-30

    申请号:US253987

    申请日:1999-02-22

    IPC分类号: G10L15/08 G10L15/18 G10L7/08

    摘要: A system and method for recognizing spoken liaisoned words. The method and system identify each word in the vocabulary as a liaison generator and/or liaison receptor. If the word is a liaison receptor, and if the word is preceded by a liaison generator, the most probable recognition result for the word will be the liaison generated by the preceding word plus the word. Liaisons are identified on an immediately preceding word in accordance with rules in a language. A word that ends with an unpronounced consonant phoneme, when followed by a word beginning with a consonant phoneme, and ends with a pronounced phoneme, when followed by a word with a vowel-like phoneme, causes a match list for the current word to be amended with words having liaisons added at their beginnings.

    摘要翻译: 用于识别口语的联系词的系统和方法。 方法和系统将词汇表中的每个单词识别为联络发生器和/或联络受体。 如果这个词是联络受体,如果这个词前面有一个联络发生器,这个词的最可能的识别结果将是前一个词加上这个词产生的联系。 根据一种语言的规则,在紧接着前面的单词上确定联络人。 一个以无声的辅音音素结尾的单词,当后跟一个以辅音音素开头的单词,并以一个明显的音素结尾,当跟一个含有元音的音素的单词时,会使当前单词的匹配列表为 用起初添加了联络的单词进行修改。

    Transcription of speech data with segments from acoustically dissimilar
environments
    7.
    发明授权
    Transcription of speech data with segments from acoustically dissimilar environments 失效
    用来自声学不同环境的片段转录语音数据

    公开(公告)号:US6067517A

    公开(公告)日:2000-05-23

    申请号:US595722

    申请日:1996-02-02

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20

    摘要: A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.

    摘要翻译: 一种在转录包含来自广泛环境的数据的语音数据时提高识别精度的技术。 在许多情况下,输入数据包含来自不同环境的各种数据源。 这样的课程包括:干净的语音,由噪声(例如,音乐),非语音(例如,没有语音的纯音乐),电话语音和扬声器的身份损坏的语音。 描述了一种技术,其中首先自动识别不同类别的数据,然后每个类由专门为其制定的系统进行转录。 本发明还描述了基于构成表征每个类中的数据的声学模型,然后使用动态规划算法(维特比算法)来自动识别属于每个类的段的分段算法。 声学模型是在某个特征空间中制成的,本发明还描述了用于不同类别的不同特征空间。

    Method and apparatus for a time-synchronous tree-based search strategy
    8.
    发明授权
    Method and apparatus for a time-synchronous tree-based search strategy 失效
    一种基于时间同步树的搜索策略的方法和装置

    公开(公告)号:US5884259A

    公开(公告)日:1999-03-16

    申请号:US798011

    申请日:1997-02-12

    IPC分类号: G10L15/08 G10L9/06

    CPC分类号: G10L15/08

    摘要: A method and apparatus for using a tree structure to constrain a time-synchronous, fast search for candidate words in an acoustic stream is described. A minimum stay of three frames in each graph node visited is imposed by allowing transitions only every third frame. This constraint enables the simplest possible Markov model for each phoneme while enforcing the desired minimum duration. The fast, time-synchronous search for likely words is done for an entire sentence/utterance. The list of hypotheses beginning at each time frame is stored for providing, on-demand, lists of contender/candidate words to the asynchronous, detailed match phase of decoding.

    摘要翻译: 描述了使用树结构约束声流中的候选词的时间同步,快速搜索的方法和装置。 在每个图形节点访问的最小停留时间为3帧,只允许每三帧进行一次转换。 这个约束使每个音素的最可能的马可夫模型成为可能,同时执行所需的最小持续时间。 快速,时间同步的搜索可能的单词是为整个句子/话语完成的。 存储在每个时间帧开始的假设列表,用于将竞争者/候选词的按需提供到解码的异步,详细匹配阶段。

    Recognizing speech having word liaisons by adding a phoneme to reference
word models
    9.
    发明授权
    Recognizing speech having word liaisons by adding a phoneme to reference word models 失效
    通过在参考词模型中添加一个音素来识别具有单词联络的语音

    公开(公告)号:US5875426A

    公开(公告)日:1999-02-23

    申请号:US662407

    申请日:1996-06-12

    IPC分类号: G10L15/08 G10L15/18 G10L5/06

    摘要: A method and system of recognizing speech. The method and system perform a fast match on a word in the string of speech to be recognized which generates a fast match list representing words in a system vocabulary that most likely match a current word to be recognized. Next, the method and system perform a detailed match on the words in the fast match list and generate a detailed match list representing words that most likely match the current word to be recognized. Then for each word in the detailed match list that can accept a liaison phoneme from a preceding word, where each word is a liaison receptor, adding to the detailed match list a form of the liaison receptor, where the form represents an addition of a liaison phoneme to the liaison receptor, creating a modified detailed match list which is inclusive of the forms of the liaison receptors added to the detailed match list. Finally the method and system outputs a word in the modified detailed match list that has the highest probability of matching the word to be recognized.

    摘要翻译: 识别语音的方法和系统。 该方法和系统对要被识别的语音串中的单词执行快速匹配,其产生表示系统词汇表中最可能匹配要被识别的当前单词的单词的快速匹配列表。 接下来,方法和系统对快速匹配列表中的单词进行详细匹配,并生成表示最可能匹配要识别的当前单词的单词的详细匹配列表。 然后,对于可以接受来自前一单词的联络音素的详细匹配列表中的每个单词,其中每个单词是联络受体,在详细匹配列表中添加一种联络受体的形式,其中形式表示添加联络 音素到联络受体,创建修改的详细匹配列表,其中包括添加到详细匹配列表中的联络受体的形式。 最后,方法和系统输出修改后的详细匹配列表中匹配要识别字词的概率最高的单词。