Speaker adaptation system and method based on class-specific
pre-clustering training speakers
    1.
    发明授权
    Speaker adaptation system and method based on class-specific pre-clustering training speakers 失效
    基于类特定的前聚类训练讲话者的演讲人适应系统和方法

    公开(公告)号:US06073096A

    公开(公告)日:2000-06-06

    申请号:US18350

    申请日:1998-02-04

    IPC分类号: G10L15/07 G10L15/06

    CPC分类号: G10L15/07

    摘要: A method of speech recognition, in accordance with the present invention includes the steps of grouping acoustics to form classes based on acoustic features, clustering training speakers by the classes to provide class-specific cluster systems, selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a test speaker, transforming the subset of cluster systems to bring the subset of cluster systems closer to the test speaker based on the adaptation data to form adapted cluster systems and combining the adapted cluster systems to create a speaker adapted system for decoding speech from the test speaker. System and methods for building speech recognition systems as well as adapting speaker systems for class-specific speaker clusters are included.

    摘要翻译: 根据本发明的语音识别方法包括以下步骤:基于声学特征对声学进行分组以形成类别,由类别聚类训练讲话者以提供特定类别的集群系统,从集群系统中选择集群的子集 最接近来自测试说话者的自适应数据的系统,基于适配数据来改变集群系统的子集以使集群系统的子集更靠近测试说话者,以形成适应的集群系统,并组合适应的集群系统以创建一个说话者适配系统 用于解码来自测试扬声器的语音。 包括构建语音识别系统的系统和方法以及适用于类特定扬声器群的扬声器系统。

    Method and apparatus for estimating phone class probabilities
a-posteriori using a decision tree
    2.
    发明授权
    Method and apparatus for estimating phone class probabilities a-posteriori using a decision tree 失效
    用于使用决策树估计电话类概率的方法和装置

    公开(公告)号:US5680509A

    公开(公告)日:1997-10-21

    申请号:US312584

    申请日:1994-09-27

    IPC分类号: G10L15/06 G10L15/08 G10L5/06

    CPC分类号: G10L15/063 G10L15/08

    摘要: A method and apparatus for estimating the probability of phones, a-posteriori, in the context of not only the acoustic feature at that time, but also the acoustic features in the vicinity of the current time, and its use in cutting down the search-space in a speech recognition system. The method constructs and uses a decision tree, with the predictors of the decision tree being the vector-quantized acoustic feature vectors at the current time, and in the vicinity of the current time. The process starts with an enumeration of all (predictor, class) events in the training data at the root node, and successively partitions the data at a node according to the most informative split at that node. An iterative algorithm is used to design the binary partitioning. After the construction of the tree is completed, the probability distribution of the predicted class is stored at all of its terminal leaves. The decision tree is used during the decoding process by tracing a path down to one of its leaves, based on the answers to binary questions about the vector-quantized acoustic feature vector at the current time and its vicinity.

    摘要翻译: 在不仅在当时的声学特征以及当前时间附近的声学特征的上下文中估计电话的概率的方法和装置,以及其用于减少搜索 - 语音识别系统中的空间。 该方法构造并使用决策树,其中决策树的预测变量是当前时间和当前时间附近的矢量量化的声学特征向量。 该过程从在根节点的训练数据中的所有(预测器,类)事件的枚举开始,并且根据该节点处的最多信息拆分在节点处依次划分数据。 迭代算法用于设计二进制分区。 树完成后,预测类的概率分布存储在其所有终端叶上。 基于对当前时间及其附近的向量量化声学特征向量的二进制问题的答案,在解码过程中使用决策树通过跟踪到其叶子之一的路径。

    System and method for partitioning the feature space of a classifier in
a pattern classification system

    公开(公告)号:US6058205A

    公开(公告)日:2000-05-02

    申请号:US781574

    申请日:1997-01-09

    IPC分类号: G06K9/62 G06F17/20

    CPC分类号: G06K9/6282

    摘要: A system and method are provided which partition the feature space of a classifier by using hyperplanes to construct a binary decision tree or hierarchical data structure for obtaining the class probabilities for a particular feature vector. One objective in the construction of the decision tree is to minimize the average entropy of the empirical class distributions at each successive node or subset, such that the average entropy of the class distributions at the terminal nodes is minimized. First, a linear discriminant vector is computed that maximally separates the classes at any particular node. A threshold is then chosen that can be applied on the value of the projection onto the hyperplane such that all feature vectors that have a projection onto the hyperplane that is less than the threshold are assigned to a child node (say, left child node) and the feature vectors that have a projection greater than or equal to the threshold are assigned to a right child node. The above two steps are then repeated for each child node until the data at a node falls below a predetermined threshold and the node is classified as a terminal node (leaf of the decision tree). After all non-terminal nodes have been processed, the final step is to store a class distribution associated with each terminal node. The class probabilities for a particular feature vector can then be obtained by traversing the decision tree in a top-down fashion until a terminal node is identified which corresponds to the particular feature vector. The information provided by the decision tree is that, in computing the class probabilities for the particular feature vector, only the small number of classes associated with that particular terminal node need be considered. Alternatively, the required class probabilities can be obtained simply by taking the stored distribution of the terminal node associated with the particular feature vector.

    Minimum bayes error feature selection in speech recognition
    4.
    发明授权
    Minimum bayes error feature selection in speech recognition 失效
    语音识别中的最小贝叶斯误差特征选择

    公开(公告)号:US07529666B1

    公开(公告)日:2009-05-05

    申请号:US09699894

    申请日:2000-10-30

    IPC分类号: G10L15/08

    摘要: In connection with speech recognition, the design of a linear transformation θεp×n, of rank p×n, which projects the features of a classifier xεn onto y=θxεp such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the θ-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of θ. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.

    摘要翻译: 结合语音识别,线性变换的设计θepsilon pxn,其排列为pxn,其投影分类器的特征xepsilon n to y = thetaxepsilon p以达到最小贝叶斯误差(或错误分类的概率)。 探索了两个途径:第一个是最大化类密度之间的θ平均差异,第二个是最小化在θ范围内绑定的Bhattacharyya。 虽然这两种方法在实践中产生类似的性能,但是它们优于标准线性判别分析特征,并且在大量词汇电话语音识别任务上显示出已知倒谱特征的误码率的10%相对提高。

    Identifying mismatches between assumed and actual pronunciations of words
    5.
    发明授权
    Identifying mismatches between assumed and actual pronunciations of words 失效
    识别假设和实际发音之间的不匹配

    公开(公告)号:US06377921B1

    公开(公告)日:2002-04-23

    申请号:US09105763

    申请日:1998-06-26

    IPC分类号: G10L1506

    CPC分类号: G10L15/063 G10L2015/0631

    摘要: A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.

    摘要翻译: 一种识别声学数据与相应转录之间的错配的方法,所述转录以基本单位表示,包括以下步骤:将声学数据与相应转录对准; 计算相对于转录的声学数据中的基本单位的每个实例的概率分数; 为每个基本单位生成分配; 基于阈值将每个基本单元的分布中的特定分数范围对应的基本单元的实例标记为不匹配; 并纠正错配。

    State-dependent speaker clustering for speaker adaptation
    6.
    发明授权
    State-dependent speaker clustering for speaker adaptation 失效
    用于说话者适应的状态依赖的扬声器聚类

    公开(公告)号:US5787394A

    公开(公告)日:1998-07-28

    申请号:US572223

    申请日:1995-12-13

    IPC分类号: G10L15/06 G10L5/06

    CPC分类号: G10L15/07 G10L2015/0631

    摘要: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

    摘要翻译: 一种适用于特定用户使用的独立于说话者的语音识别系统的系统和方法。 该系统和方法从测试扬声器收集声学表征数据,并将数据与为多个训练说话者生成的声学特征数据进行比较。 在特定声学子空间的测试扬声器的声学特性与相同声学子空间的每个训练说话者的声学特性之间计算匹配分数。 训练演讲者根据其分数对子空间进行排名,并且基于测试讲者的声学表征数据和最接近的匹配训练说话者的声学表征数据为测试说话者生成新的声学模型。 对于每个声学子空间重复该过程。

    Methods and apparatus for processing information signals based on content
    7.
    发明申请
    Methods and apparatus for processing information signals based on content 审中-公开
    基于内容处理信息信号的方法和装置

    公开(公告)号:US20060271365A1

    公开(公告)日:2006-11-30

    申请号:US11494247

    申请日:2006-07-27

    IPC分类号: G10L15/04 G10L15/00

    摘要: Methods and apparatus are provided for processing an information signal containing content presented in accordance with at least one modality. In one aspect of the present invention, a method of processing an information signal containing content presented in accordance with at least one modality, comprises the steps of: (i) obtaining the information signal; (ii) performing content detection on the information signal to detect whether the information signal includes particular content presented in accordance with the at least one modality; and (iii) generating a control signal, when the particular content is detected, for use in controlling a rendering property of the particular content and/or implementation of a specific action relating to the particular content. Various illustrative embodiments in the context of speech signal processing for use in voicemail and/or cellular phone applications are provided, as well as illustrative embodiments associated with the processing of multi-modal or multimedia information signals. Also, the present invention provides for storing selectively marked information, even in the absence of content detection, such that the information may be rendered and/or used at a later time. The invention also extends to processing of text-based and markup language-based signals, e.g., XML documents.

    摘要翻译: 提供了用于处理包含根据至少一种模态呈现的内容的信息信号的方法和装置。 在本发明的一个方面,一种处理包含根据至少一种模态呈现的内容的信息信号的方法包括以下步骤:(i)获得信息信号; (ii)对所述信息信号执行内容检测,以检测所述信息信号是否包括根据所述至少一种模式呈现的特定内容; 以及(iii)当检测到特定内容时,生成控制信号,以用于控制特定内容的呈现属性和/或与特定内容相关的特定动作的实现。 提供了在语音邮件和/或蜂窝电话应用中使用的语音信号处理的上下文中的各种说明性实施例,以及与多模式或多媒体信息信号的处理相关联的说明性实施例。 此外,本发明提供了即使在没有内容检测的情况下存储选择性标记的信息,使得可以在稍后时间呈现和/或使用该信息。 本发明还扩展到处理基于文本和标记语言的信号,例如XML文档。

    Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech
    8.
    发明授权
    Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech 失效
    用于通过连续语音的累积分布函数匹配快速适应的方法和装置

    公开(公告)号:US06470314B1

    公开(公告)日:2002-10-22

    申请号:US09543794

    申请日:2000-04-06

    IPC分类号: G10L1502

    CPC分类号: G10L15/07

    摘要: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.

    摘要翻译: 一种将语音识别系统适应于一个或多个声学条件的方法包括以下步骤:(i)基于与提供给语音识别系统的训练语音数据相关联的语音向量的尺寸来计算累积分布函数; (ii)基于与提供给语音识别系统的测试语音数据相关联的语音向量的尺寸计算累积分布函数; (iii)基于与训练语音数据相关联的累积分布函数和与测试语音数据相关联的累积分布函数来计算非线性变换映射; 以及(iv)将非线性变换映射应用于在识别之前与测试语音数据相关联的语音向量,其中根据非线性变换映射变换的语音向量基本上类似于与训练语音数据相关联的语音向量。

    Methods and apparatus for forming compound words for use in a continuous speech recognition system
    9.
    发明授权
    Methods and apparatus for forming compound words for use in a continuous speech recognition system 失效
    用于形成复合词用于连续语音识别系统的方法和装置

    公开(公告)号:US06385579B1

    公开(公告)日:2002-05-07

    申请号:US09302032

    申请日:1999-04-29

    IPC分类号: G10L1506

    CPC分类号: G10L15/063 G10L15/197

    摘要: A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair. A fourth measure is based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair.

    摘要翻译: 用与语音识别系统相关联的复合词形成增强文本训练语料库的方法包括计算训练语料库中连续词对的度量。 然后将度量与阈值进行比较。 根据测量与阈值之间的比较结果,在训练语料库中用相应的复合词替换连续词对。 可以采用一种或多种措施。 第一个度量是直接二元组概率值和反向双汇概率值的平均值。 第二个措施是基于对中的单词之间的相互信息。 第三个措施是基于一对共同关联的基本形式比形成该对的单词的非共同关联的单个基本形式的级联的次数的比较的比较。 第四个度量是基于特定复合词的平均电话识别分数与该对的单词的相应的平均电话识别分数之和的差。

    Specific task composite acoustic models
    10.
    发明授权
    Specific task composite acoustic models 有权
    具体任务复合声学模型

    公开(公告)号:US06260014B1

    公开(公告)日:2001-07-10

    申请号:US09153222

    申请日:1998-09-14

    IPC分类号: G10L1504

    摘要: A method for recognizing speech includes the steps of providing a generic model having a baseform representation of a vocabulary of words, identifying a subset of words relating to an application, constructing a task specific model for the subset of words, constructing a composite model by combining the generic and task specific models and modifying the baseform representation of the subset of words such that the subset of words are recognized by the task specific model. A system for recognizing speech includes a composite model having a generic model having a generic baseform representation of a vocabulary of words and a task specific model for recognizing a subset of words relating to an application wherein the subset of words are recognized using a modified baseform representation. A recognizer compares words input thereto with the generic model for words other than the subset of words and with the task specific model for the subset of words.

    摘要翻译: 一种用于识别语音的方法包括以下步骤:提供具有词汇词典的基本形式表示的通用模型,识别与应用有关的单词的子集,为所述单词子集构建任务特定模型,通过组合来构建复合模型 通用和任务特定模型,并修改单词子集的基本形式表示,使得单词的子集由任务特定模型识别。 用于识别语音的系统包括具有通用模型的复合模型,所述通用模型具有词汇词典的通用基本形式表示,以及用于识别与应用有关的词组的任务特定模型,其中使用经修改的基本形式表示来识别单词的子集 。 识别器将输入的词与除单词子集之外的单词的通用模型和词语子集的任务特定模型进行比较。