Minimum bayes error feature selection in speech recognition
    1.
    发明授权
    Minimum bayes error feature selection in speech recognition 失效
    语音识别中的最小贝叶斯误差特征选择

    公开(公告)号:US07529666B1

    公开(公告)日:2009-05-05

    申请号:US09699894

    申请日:2000-10-30

    IPC分类号: G10L15/08

    摘要: In connection with speech recognition, the design of a linear transformation θεp×n, of rank p×n, which projects the features of a classifier xεn onto y=θxεp such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the θ-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of θ. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.

    摘要翻译: 结合语音识别,线性变换的设计θepsilon pxn,其排列为pxn,其投影分类器的特征xepsilon n to y = thetaxepsilon p以达到最小贝叶斯误差(或错误分类的概率)。 探索了两个途径:第一个是最大化类密度之间的θ平均差异,第二个是最小化在θ范围内绑定的Bhattacharyya。 虽然这两种方法在实践中产生类似的性能,但是它们优于标准线性判别分析特征,并且在大量词汇电话语音识别任务上显示出已知倒谱特征的误码率的10%相对提高。

    Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation
    2.
    发明授权
    Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation 有权
    用于说话者适应的基于格子的无监督最大似然线性回归

    公开(公告)号:US07216077B1

    公开(公告)日:2007-05-08

    申请号:US09670251

    申请日:2000-09-26

    IPC分类号: G10L15/06 G10L15/14

    CPC分类号: G10L15/065

    摘要: Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

    摘要翻译: 使用基于网格的信息进行无监督的演讲者适应的方法和安排。 通过对单词格进行调整,正确的模型更有可能用于估计变换。 此外,本文中提出的特定类型的晶格使得能够使用由状态的后占用概率给出的自然置信度度量,即,仅当前一帧的后验概率 该特定时间的状态大于预定阈值。

    System and method for partitioning the feature space of a classifier in
a pattern classification system

    公开(公告)号:US6058205A

    公开(公告)日:2000-05-02

    申请号:US781574

    申请日:1997-01-09

    IPC分类号: G06K9/62 G06F17/20

    CPC分类号: G06K9/6282

    摘要: A system and method are provided which partition the feature space of a classifier by using hyperplanes to construct a binary decision tree or hierarchical data structure for obtaining the class probabilities for a particular feature vector. One objective in the construction of the decision tree is to minimize the average entropy of the empirical class distributions at each successive node or subset, such that the average entropy of the class distributions at the terminal nodes is minimized. First, a linear discriminant vector is computed that maximally separates the classes at any particular node. A threshold is then chosen that can be applied on the value of the projection onto the hyperplane such that all feature vectors that have a projection onto the hyperplane that is less than the threshold are assigned to a child node (say, left child node) and the feature vectors that have a projection greater than or equal to the threshold are assigned to a right child node. The above two steps are then repeated for each child node until the data at a node falls below a predetermined threshold and the node is classified as a terminal node (leaf of the decision tree). After all non-terminal nodes have been processed, the final step is to store a class distribution associated with each terminal node. The class probabilities for a particular feature vector can then be obtained by traversing the decision tree in a top-down fashion until a terminal node is identified which corresponds to the particular feature vector. The information provided by the decision tree is that, in computing the class probabilities for the particular feature vector, only the small number of classes associated with that particular terminal node need be considered. Alternatively, the required class probabilities can be obtained simply by taking the stored distribution of the terminal node associated with the particular feature vector.

    Identifying mismatches between assumed and actual pronunciations of words
    4.
    发明授权
    Identifying mismatches between assumed and actual pronunciations of words 失效
    识别假设和实际发音之间的不匹配

    公开(公告)号:US06377921B1

    公开(公告)日:2002-04-23

    申请号:US09105763

    申请日:1998-06-26

    IPC分类号: G10L1506

    CPC分类号: G10L15/063 G10L2015/0631

    摘要: A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.

    摘要翻译: 一种识别声学数据与相应转录之间的错配的方法,所述转录以基本单位表示,包括以下步骤:将声学数据与相应转录对准; 计算相对于转录的声学数据中的基本单位的每个实例的概率分数; 为每个基本单位生成分配; 基于阈值将每个基本单元的分布中的特定分数范围对应的基本单元的实例标记为不匹配; 并纠正错配。

    State-dependent speaker clustering for speaker adaptation
    5.
    发明授权
    State-dependent speaker clustering for speaker adaptation 失效
    用于说话者适应的状态依赖的扬声器聚类

    公开(公告)号:US5787394A

    公开(公告)日:1998-07-28

    申请号:US572223

    申请日:1995-12-13

    IPC分类号: G10L15/06 G10L5/06

    CPC分类号: G10L15/07 G10L2015/0631

    摘要: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

    摘要翻译: 一种适用于特定用户使用的独立于说话者的语音识别系统的系统和方法。 该系统和方法从测试扬声器收集声学表征数据,并将数据与为多个训练说话者生成的声学特征数据进行比较。 在特定声学子空间的测试扬声器的声学特性与相同声学子空间的每个训练说话者的声学特性之间计算匹配分数。 训练演讲者根据其分数对子空间进行排名,并且基于测试讲者的声学表征数据和最接近的匹配训练说话者的声学表征数据为测试说话者生成新的声学模型。 对于每个声学子空间重复该过程。

    Methods and apparatus for processing information signals based on content
    6.
    发明申请
    Methods and apparatus for processing information signals based on content 审中-公开
    基于内容处理信息信号的方法和装置

    公开(公告)号:US20060271365A1

    公开(公告)日:2006-11-30

    申请号:US11494247

    申请日:2006-07-27

    IPC分类号: G10L15/04 G10L15/00

    摘要: Methods and apparatus are provided for processing an information signal containing content presented in accordance with at least one modality. In one aspect of the present invention, a method of processing an information signal containing content presented in accordance with at least one modality, comprises the steps of: (i) obtaining the information signal; (ii) performing content detection on the information signal to detect whether the information signal includes particular content presented in accordance with the at least one modality; and (iii) generating a control signal, when the particular content is detected, for use in controlling a rendering property of the particular content and/or implementation of a specific action relating to the particular content. Various illustrative embodiments in the context of speech signal processing for use in voicemail and/or cellular phone applications are provided, as well as illustrative embodiments associated with the processing of multi-modal or multimedia information signals. Also, the present invention provides for storing selectively marked information, even in the absence of content detection, such that the information may be rendered and/or used at a later time. The invention also extends to processing of text-based and markup language-based signals, e.g., XML documents.

    摘要翻译: 提供了用于处理包含根据至少一种模态呈现的内容的信息信号的方法和装置。 在本发明的一个方面,一种处理包含根据至少一种模态呈现的内容的信息信号的方法包括以下步骤:(i)获得信息信号; (ii)对所述信息信号执行内容检测,以检测所述信息信号是否包括根据所述至少一种模式呈现的特定内容; 以及(iii)当检测到特定内容时,生成控制信号,以用于控制特定内容的呈现属性和/或与特定内容相关的特定动作的实现。 提供了在语音邮件和/或蜂窝电话应用中使用的语音信号处理的上下文中的各种说明性实施例,以及与多模式或多媒体信息信号的处理相关联的说明性实施例。 此外,本发明提供了即使在没有内容检测的情况下存储选择性标记的信息,使得可以在稍后时间呈现和/或使用该信息。 本发明还扩展到处理基于文本和标记语言的信号,例如XML文档。

    Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech
    7.
    发明授权
    Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech 失效
    用于通过连续语音的累积分布函数匹配快速适应的方法和装置

    公开(公告)号:US06470314B1

    公开(公告)日:2002-10-22

    申请号:US09543794

    申请日:2000-04-06

    IPC分类号: G10L1502

    CPC分类号: G10L15/07

    摘要: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.

    摘要翻译: 一种将语音识别系统适应于一个或多个声学条件的方法包括以下步骤:(i)基于与提供给语音识别系统的训练语音数据相关联的语音向量的尺寸来计算累积分布函数; (ii)基于与提供给语音识别系统的测试语音数据相关联的语音向量的尺寸计算累积分布函数; (iii)基于与训练语音数据相关联的累积分布函数和与测试语音数据相关联的累积分布函数来计算非线性变换映射; 以及(iv)将非线性变换映射应用于在识别之前与测试语音数据相关联的语音向量,其中根据非线性变换映射变换的语音向量基本上类似于与训练语音数据相关联的语音向量。

    Methods and apparatus for forming compound words for use in a continuous speech recognition system
    8.
    发明授权
    Methods and apparatus for forming compound words for use in a continuous speech recognition system 失效
    用于形成复合词用于连续语音识别系统的方法和装置

    公开(公告)号:US06385579B1

    公开(公告)日:2002-05-07

    申请号:US09302032

    申请日:1999-04-29

    IPC分类号: G10L1506

    CPC分类号: G10L15/063 G10L15/197

    摘要: A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair. A fourth measure is based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair.

    摘要翻译: 用与语音识别系统相关联的复合词形成增强文本训练语料库的方法包括计算训练语料库中连续词对的度量。 然后将度量与阈值进行比较。 根据测量与阈值之间的比较结果,在训练语料库中用相应的复合词替换连续词对。 可以采用一种或多种措施。 第一个度量是直接二元组概率值和反向双汇概率值的平均值。 第二个措施是基于对中的单词之间的相互信息。 第三个措施是基于一对共同关联的基本形式比形成该对的单词的非共同关联的单个基本形式的级联的次数的比较的比较。 第四个度量是基于特定复合词的平均电话识别分数与该对的单词的相应的平均电话识别分数之和的差。

    Specific task composite acoustic models
    9.
    发明授权
    Specific task composite acoustic models 有权
    具体任务复合声学模型

    公开(公告)号:US06260014B1

    公开(公告)日:2001-07-10

    申请号:US09153222

    申请日:1998-09-14

    IPC分类号: G10L1504

    摘要: A method for recognizing speech includes the steps of providing a generic model having a baseform representation of a vocabulary of words, identifying a subset of words relating to an application, constructing a task specific model for the subset of words, constructing a composite model by combining the generic and task specific models and modifying the baseform representation of the subset of words such that the subset of words are recognized by the task specific model. A system for recognizing speech includes a composite model having a generic model having a generic baseform representation of a vocabulary of words and a task specific model for recognizing a subset of words relating to an application wherein the subset of words are recognized using a modified baseform representation. A recognizer compares words input thereto with the generic model for words other than the subset of words and with the task specific model for the subset of words.

    摘要翻译: 一种用于识别语音的方法包括以下步骤:提供具有词汇词典的基本形式表示的通用模型,识别与应用有关的单词的子集,为所述单词子集构建任务特定模型,通过组合来构建复合模型 通用和任务特定模型,并修改单词子集的基本形式表示,使得单词的子集由任务特定模型识别。 用于识别语音的系统包括具有通用模型的复合模型,所述通用模型具有词汇词典的通用基本形式表示,以及用于识别与应用有关的词组的任务特定模型,其中使用经修改的基本形式表示来识别单词的子集 。 识别器将输入的词与除单词子集之外的单词的通用模型和词语子集的任务特定模型进行比较。

    Telephone messaging and editing system
    10.
    发明授权
    Telephone messaging and editing system 有权
    电话信息和编辑系统

    公开(公告)号:US06219638B1

    公开(公告)日:2001-04-17

    申请号:US09185332

    申请日:1998-11-03

    IPC分类号: G10L1508

    摘要: A messaging system for receiving speech over a telephone and converting the speech to text includes a first server for receiving speech input by a user, a speech recognition system for converting the speech to text, a speech synthesizer for converting the text to speech for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.

    摘要翻译: 一种用于通过电话接收语音并将语音转换为文本的消息系统包括用于接收用户输入的语音的第一服务器,用于将语音转换为文本的语音识别系统,用于将文本转换为语音以进行回放的语音合成器 用于用户校正的合成语音和用于使用户能够校正语音的校正机制,使得校正的语音被提供为用于通过通信系统传送的文本。