Generic framework for large-margin MCE training in speech recognition
    21.
    发明授权
    Generic framework for large-margin MCE training in speech recognition 有权
    语言识别中大面积MCE培训的通用框架

    公开(公告)号:US08423364B2

    公开(公告)日:2013-04-16

    申请号:US11708440

    申请日:2007-02-20

    IPC分类号: G10L15/14 G10L15/00 G10L15/06

    CPC分类号: G10L15/063 G10L2015/0631

    摘要: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

    摘要翻译: 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定初始声学模型的每个令牌,分数计算分别为正确的类和竞争类。 此外,针对每个训练令牌计算样本自适应窗口带宽。 从计算出的分数和采样自适应窗口带宽值,根据损失函数计算损失值。 可以从贝叶斯风险最小化观点导出的损失函数可以包括移动判定边界的边距值,使得靠近判定边界的正确令牌的令牌到边界的距离最大化。 边距可以是固定边距,也可以作为算法迭代的函数单调变化。 基于计算的损失值更新声学模型。 可以重复该过程,直到满足经验收敛。

    High performance HMM adaptation with joint compensation of additive and convolutive distortions
    22.
    发明授权
    High performance HMM adaptation with joint compensation of additive and convolutive distortions 有权
    高性能HMM适应与加法和卷积扭曲的联合补偿

    公开(公告)号:US08180637B2

    公开(公告)日:2012-05-15

    申请号:US11949044

    申请日:2007-12-03

    IPC分类号: G10L15/00 G10L15/20 G10L17/00

    CPC分类号: G10L15/20 G10L15/142

    摘要: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

    摘要翻译: 讨论了补偿施加到表示话语的信号的加法和卷积失真的方法。 该方法包括接收信号并初始化噪声平均和信道均值向量。 计算或更新高斯依赖矩阵和隐马尔可夫模型(HMM)参数以考虑来自信道平均向量的噪声平均向量或卷积失真的加性噪声​​。 HMM参数通过使用先前计算出的HMM参数解码话音并根据解码期间接收到的数据调整高斯相关矩阵和HMM参数进行调整。 适应的HMM参数被应用于解码输入的话语并提供话语的转录。

    Time synchronous decoding for long-span hidden trajectory model
    23.
    发明授权
    Time synchronous decoding for long-span hidden trajectory model 有权
    长跨隐藏轨迹模型的时间同步解码

    公开(公告)号:US07877256B2

    公开(公告)日:2011-01-25

    申请号:US11356905

    申请日:2006-02-17

    IPC分类号: G10L15/14

    CPC分类号: G10L15/08

    摘要: A time-synchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, hypotheses are represented as traces that include an indication of a current frame, previous frames and future frames. Each frame can include an associated linguistic unit such as a phone or units that are derived from a phone. Additionally, pruning strategies can be applied to speed up the search. Further, word-ending recombination methods are developed to speed up the computation. These methods can effectively deal with an exponentially increased search space.

    摘要翻译: 开发了一种时间同步的格格约束搜索算法,用于处理具有长语境跨度能力的语言语言模型。 在算法中,假设被表示为包括当前帧,先前帧和未来帧的指示的迹线。 每个帧可以包括相关联的语言单元,例如从电话派生的电话或单元。 此外,可以应用修剪策略来加快搜索速度。 此外,开发了文字重组方法以加速计算。 这些方法可以有效地处理指数级增加的搜索空间。

    ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION
    24.
    发明申请
    ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION 有权
    适应用于语音识别的压缩模型

    公开(公告)号:US20100076757A1

    公开(公告)日:2010-03-25

    申请号:US12235748

    申请日:2008-09-23

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20 G10L15/065

    摘要: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.

    摘要翻译: 语音识别系统包括接收失真的语音话语的接收机组件。 所述语音识别还包括适配器组件,所述适配器组件选择性地适配用于识别所述失真语音话语的至少一部分的压缩模型的参数,其中所述适配器组件至少部分地基于接收失真的语音话语选择性地调整所述压缩模型的参数 讲话话语。

    Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories
    25.
    发明授权
    Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories 有权
    具有结构化隐藏动力学的声学模型,并集成了许多可能的隐藏轨迹

    公开(公告)号:US07565284B2

    公开(公告)日:2009-07-21

    申请号:US11071904

    申请日:2005-03-01

    IPC分类号: G10L15/14

    CPC分类号: G10L15/02 G10L2015/025

    摘要: A method of producing at least one possible sequence of vocal tract resonance (VTR) for a fixed sequence of phonetic units, and producing the acoustic observation probability by integrating over such distributions is provided. The method includes identifying a sequence of target distributions for a VTR sequence corresponding to a phone sequence with a given segmentation. The sequence of target distributions is applied to a finite impulse response filter to produce distributions for possible VTR trajectories. Then these distributions are applied to a linearized nonlinear function to produce the acoustic observation probability for the given sequence of phonetic units. This acoustic observation probability is used for phonetic recognition.

    摘要翻译: 提供了一种用于固定语音单元序列的至少一个可能的声道共振(VTR)序列的方法,并且通过在这样的分布上积分来产生声学观察概率。 该方法包括识别对应于具有给定分割的电话序列的VTR序列的目标分布序列。 将目标分布的序列应用于有限脉冲响应滤波器,以产生可能的VTR轨迹的分布。 然后将这些分布应用于线性化非线性函数,以产生给定的语音单元序列的声学观察概率。 这种声学观测概率用于语音识别。

    Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction
    26.
    发明申请
    Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction 有权
    使用语音合成和还原的双向目标滤波模型进行语音识别的两阶段实现

    公开(公告)号:US20060200351A1

    公开(公告)日:2006-09-07

    申请号:US11069474

    申请日:2005-03-01

    IPC分类号: G10L15/04

    摘要: A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values. This then permits the use of the model for phonetic recognition.

    摘要翻译: 用新的两阶段实现来描述语音合成和简化的结构化生成模型。 在第一阶段,使用电话序列中共振目标的先前信息产生共振峰或声道共振(VTR)的动力学。 具有有限脉冲响应(FIR)的双向时间滤波作为FIR滤波器的输入应用于分段目标序列。 在第二阶段,基于FIR滤波的VTR目标,分析地预测语音cepstra的动力学。 这两个阶段的组合系统因此产生相关和因果相关的VTR和倒谱动力学,其中语音减少在隐藏共振空间中明确表示,并且隐含地在观察到的倒频谱空间中。 组合系统还给出了电话序列的声学观察概率。 使用这种概率,可以根据它们各自的概率值对不同的电话序列进行比较和排序。 这样就允许使用模型进行语音识别。

    Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech

    公开(公告)号:US20060074676A1

    公开(公告)日:2006-04-06

    申请号:US10944262

    申请日:2004-09-17

    IPC分类号: G10L13/04

    CPC分类号: G10L13/02 G10L25/15

    摘要: A method of identifying a sequence of formant trajectory values is provided in which a sequence of target values are identified for a formant as step functions. The target values and the duration for each segment target for the formant are applied to a finite impulse response filter to form a sequence of formant trajectory values. The parameters of this filter, as well as the duration of the targets for each phone, can be modified to produce many kinds of target undershooting effects in a contextually assimilated manner. The procedure for producing the formant trajectory values does not require any acoustic data from speech.

    Phase sensitive model adaptation for noisy speech recognition
    28.
    发明授权
    Phase sensitive model adaptation for noisy speech recognition 有权
    嘈杂语音识别的相敏模型适应

    公开(公告)号:US08214215B2

    公开(公告)日:2012-07-03

    申请号:US12236530

    申请日:2008-09-24

    IPC分类号: G10L15/14

    CPC分类号: G10L15/065 G10L15/20

    摘要: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.

    摘要翻译: 本文描述的语音识别系统包括接收失真的语音话语的接收机组件。 所述语音识别还包括与第一模型和第二模型通信的更新器组件,其中所述更新器组件至少部分地基于由所述第一模型输出的加法和卷积失真的联合估计来自动更新所述第二模型的参数 其中,加法和卷积失真的联合估计是基于由接收器部件接收的语音发声中的相敏模型的失真估计。 此外,还可以估计用于更新第二模型参数的除加法和卷积失真之外的失真,包括其他静止和非平稳源。

    Piecewise-based variable-parameter Hidden Markov Models and the training thereof
    29.
    发明授权
    Piecewise-based variable-parameter Hidden Markov Models and the training thereof 有权
    基于分段的可变参数隐马尔科夫模型及其训练

    公开(公告)号:US08160878B2

    公开(公告)日:2012-04-17

    申请号:US12211114

    申请日:2008-09-16

    IPC分类号: G10L15/14 G10L15/20

    CPC分类号: G10L15/144

    摘要: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter μ and a variance parameter Σ. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter μ and the variance parameter Σ of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.

    摘要翻译: 语音识别系统使用高斯混合可变参数隐马尔可夫模型(VPHMM)来识别许多不同条件下的语音。 VPHMM的每个高斯混合分量的特征在于平均参数μ和方差参数&Sgr。 这些高斯参数中的每一个作为至少一个环境调节参数的函数而变化,例如但不限于瞬时信噪比(SNR)。 高斯参数随环境条件参数变化的方式可以近似为分段函数,如三次样条函数。 此外,识别系统制定均值参数μ和方差参数&Sgr; 每个高斯混合分量以有效的形式适应使用歧视性训练和参数共享。 执行参数共享,以便通过实际可行的训练数据量可以有效地减少VPHMM中非常大量的参数。

    Parameter clustering and sharing for variable-parameter hidden markov models
    30.
    发明授权
    Parameter clustering and sharing for variable-parameter hidden markov models 有权
    可变参数隐马尔可夫模型的参数聚类和共享

    公开(公告)号:US08145488B2

    公开(公告)日:2012-03-27

    申请号:US12211115

    申请日:2008-09-16

    IPC分类号: G10L15/14

    CPC分类号: G10L15/142

    摘要: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.

    摘要翻译: 语音识别系统使用高斯混合可变参数隐马尔可夫模型(VPHMM)来识别语音。 VPHMM包括作为至少一个环境调节参数的函数而变化的高斯参数。 每个高斯参数与环境条件参数的关系使用分段拟合方法建模,例如通过使用样条函数。 在训练阶段,识别系统可以使用聚类来识别样条函数的类别,每个类别根据一些距离度量将彼此相似的样条函数分组在一起。 识别系统然后可以存储表示各种样条函数的样条参数集合。 属于类的样条函数的一个实例可以引用相关联的一组样条参数。 高斯参数可以以适合以上述方式共享使用的有效形式来表示。