Word-dependent transition models in HMM based word alignment for statistical machine translation
    21.
    发明申请
    Word-dependent transition models in HMM based word alignment for statistical machine translation 有权
    用于统计机器翻译的基于HMM的词对齐中的词依赖过渡模型

    公开(公告)号:US20090112573A1

    公开(公告)日:2009-04-30

    申请号:US11980257

    申请日:2007-10-30

    申请人: Xiaodong He

    发明人: Xiaodong He

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827

    摘要: A word alignment modeler uses probabilistic learning techniques to train “word-dependent transition models” for use in constructing phrase level Hidden Markov Model (HMM) based word alignment models. As defined herein, “word-dependent transition models” provide a probabilistic model wherein for each source word in training data, a self-transition probability is modeled in combination with a probability of jumping from that particular word to a different word, thereby providing a full transition model for each word in a source phrase. HMM based word alignment models are then used for various word alignment and machine translation tasks. In additional embodiments sparse data problems (i.e., rarely used words) are addressed by using probabilistic learning techniques to estimate word-dependent transition model parameters by maximum a posteriori (MAP) training.

    摘要翻译: 词对齐建模者使用概率学习技术来训练用于构建基于短语级隐马尔可夫模型(HMM)的词对齐模型的“依赖于字的转换模型”。 如本文所定义的,“字相关转换模型”提供概率模型,其中对于训练数据中的每个源词,将自转移概率与从特定单词跳转到不同单词的概率相结合来建模,从而提供 源短语中每个单词的完全转换模型。 然后,基于HMM的字对齐模型用于各种字对齐和机器翻译任务。 在另外的实施例中,稀疏数据问题(即,很少使用的单词)通过使用概率学习技术来通过最大后验(MAP)训练估计单词相关过渡模型参数来解决。

    Validation of the consistency of automatic terminology translation
    22.
    发明申请
    Validation of the consistency of automatic terminology translation 有权
    验证自动术语翻译的一致性

    公开(公告)号:US20090063126A1

    公开(公告)日:2009-03-05

    申请号:US11897197

    申请日:2007-08-29

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2818

    摘要: A method of determining the consistency of training data for a machine translation system is disclosed. The method includes receiving a signal indicative of a source language corpus and a target language corpus. A textual string is extracted from the source language corpus. The textual string is aligned with the target language corpus to identify a translation for the textual string from the target language corpus. A consistency index is calculated based on a relationship between the textual string from the source language corpus and the translation. An indication of the consistency index is stored on a tangible medium.

    摘要翻译: 公开了一种确定机器翻译系统的训练数据的一致性的方法。 该方法包括接收指示源语言语料库和目标语言语料库的信号。 从源语言语料库中提取文本字符串。 文本字符串与目标语言语料库对齐,以从目标语言语料库中标识文本字符串的翻译。 基于源语言语料库的文本字符串与翻译之间的关系计算一致性索引。 一致性指数的指示存储在有形介质上。

    Segment-discriminating minimum classification error pattern recognition
    23.
    发明申请
    Segment-discriminating minimum classification error pattern recognition 有权
    段鉴别最小分类误差模式识别

    公开(公告)号:US20080181489A1

    公开(公告)日:2008-07-31

    申请号:US11700664

    申请日:2007-01-31

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6217 G10L15/142

    摘要: Pattern model parameters are updated using update equations based on competing patterns that are identical to a reference pattern except for one segment at a time that is replaced with a competing segment. This allows pattern recognition parameters to be tuned one segment at a time, rather than have to try to model distinguishing features of the correct pattern model as a whole, according to an illustrative embodiment. A reference pattern and competing patterns are divided into pattern segments. A set of training patterns are generated by replacing one of the pattern segments in the reference pattern with a corresponding competing pattern segment. For each of the training patterns, a pattern recognition model is applied to evaluate a relative degree of correspondence of the reference pattern with the pattern signal compared to a degree of correspondence of the training patterns with the pattern signal.

    摘要翻译: 基于与参考模式相同的竞争模式的更新方程来更新模式模型参数,除了一次被竞争的段替换的一个段。 这允许模式识别参数一次调整一个段,而不是根据说明性实施例而不必为整体模拟正确模式模型的区分特征。 参考模式和竞争模式分为模式段。 通过将参考图案中的一个图案片段替换为相应的竞争图案片段来生成一组训练图案。 对于每个训练模式,应用模式识别模型来评估参考模式与模式信号的相对程度,与训练模式与模式信号的对应程度相比较。

    Confidence threshold tuning
    25.
    发明申请

    公开(公告)号:US20060293886A1

    公开(公告)日:2006-12-28

    申请号:US11168278

    申请日:2005-06-28

    IPC分类号: G10L15/00

    CPC分类号: G10L15/08

    摘要: An expected dialog-turn (ED) value is estimated for evaluating a speech application. Parameters such as a confidence threshold setting can be adjusted based on the expected dialog-turn value. In a particular example, recognition results and corresponding confidence scores are used to estimate the expected dialog-turn value. The recognition results can be associated with a possible outcome for the speech application and a cost for the possible outcome can be used to estimate the expected dialog-turn value.

    Generic framework for large-margin MCE training in speech recognition
    26.
    发明授权
    Generic framework for large-margin MCE training in speech recognition 有权
    语言识别中大面积MCE培训的通用框架

    公开(公告)号:US08423364B2

    公开(公告)日:2013-04-16

    申请号:US11708440

    申请日:2007-02-20

    IPC分类号: G10L15/14 G10L15/00 G10L15/06

    CPC分类号: G10L15/063 G10L2015/0631

    摘要: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

    摘要翻译: 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定初始声学模型的每个令牌,分数计算分别为正确的类和竞争类。 此外,针对每个训练令牌计算样本自适应窗口带宽。 从计算出的分数和采样自适应窗口带宽值,根据损失函数计算损失值。 可以从贝叶斯风险最小化观点导出的损失函数可以包括移动判定边界的边距值,使得靠近判定边界的正确令牌的令牌到边界的距离最大化。 边距可以是固定边距,也可以作为算法迭代的函数单调变化。 基于计算的损失值更新声学模型。 可以重复该过程,直到满足经验收敛。

    Minimum classification error training with growth transformation optimization
    27.
    发明授权
    Minimum classification error training with growth transformation optimization 有权
    最小分类误差训练与生长变换优化

    公开(公告)号:US08301449B2

    公开(公告)日:2012-10-30

    申请号:US11581673

    申请日:2006-10-16

    申请人: Xiaodong He Li Deng

    发明人: Xiaodong He Li Deng

    IPC分类号: G10L15/00

    CPC分类号: G10L15/063 G10L15/144

    摘要: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

    摘要翻译: 使用基于最小分类误差目标函数的生长变换优化的更新方程来更新隐马尔可夫模型(HMM)参数。 使用通过使用当前迭代HMM参数对训练数据进行解码而获得的N个最佳竞争者词序列表,迭代地更新当前HMM参数。 更新过程涉及使用可以获得任何正实值的每个竞争者词序列的权重。 更新过程进一步扩展到使用竞争者的解码格子的情况。 在这种情况下,更新模型参数依赖于基于跨越时间点而不是整个单词序列的单词来确定在时间点的状态的概率。 这个字边界的时间范围比整个单词序列的持续时间短,从而减少了计算时间。

    SPEECH MODELS GENERATED USING COMPETITIVE TRAINING, ASYMMETRIC TRAINING, AND DATA BOOSTING
    28.
    发明申请
    SPEECH MODELS GENERATED USING COMPETITIVE TRAINING, ASYMMETRIC TRAINING, AND DATA BOOSTING 有权
    使用竞争性培训,不对称培训和数据提升生成的语音模型

    公开(公告)号:US20100161330A1

    公开(公告)日:2010-06-24

    申请号:US12720968

    申请日:2010-03-10

    申请人: Xiaodong He Jian Wu

    发明人: Xiaodong He Jian Wu

    IPC分类号: G10L15/06 G10L15/00

    CPC分类号: G10L15/063

    摘要: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.

    摘要翻译: 使用三种不同的训练系统中的一种或多种来训练语音模型。 它们包括减少识别结果与真实结果之间的距离的竞争性训练,对训练数据进行分组和加权的数据提升以及不同模型组成部分的不对称训练。

    Validation of the consistency of automatic terminology translation

    公开(公告)号:US08548791B2

    公开(公告)日:2013-10-01

    申请号:US11897197

    申请日:2007-08-29

    IPC分类号: G06F17/28 G06F17/20

    CPC分类号: G06F17/2818

    摘要: A method of determining the consistency of training data for a machine translation system is disclosed. The method includes receiving a signal indicative of a source language corpus and a target language corpus. A textual string is extracted from the source language corpus. The textual string is aligned with the target language corpus to identify a translation for the textual string from the target language corpus. A consistency index is calculated based on a relationship between the textual string from the source language corpus and the translation. An indication of the consistency index is stored on a tangible medium.

    DISCRIMINATIVE LEARNING OF FEATURE FUNCTIONS OF GENERATIVE TYPE IN SPEECH TRANSLATION
    30.
    发明申请
    DISCRIMINATIVE LEARNING OF FEATURE FUNCTIONS OF GENERATIVE TYPE IN SPEECH TRANSLATION 审中-公开
    语音翻译中生成型特征功能的辨别学习

    公开(公告)号:US20130110491A1

    公开(公告)日:2013-05-02

    申请号:US13283633

    申请日:2011-10-28

    申请人: Xiaodong He Li Deng

    发明人: Xiaodong He Li Deng

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2818 G10L15/18

    摘要: Architecture that formulates speech translation as a unified log-linear model with a plurality of feature functions, some of which are derived from generative models. The architecture employs discriminative training for the generative features based on an optimization technique referred to as growth transformation. A discriminative training objective function is formulated for speech translation as well as a growth transformation-based model training method that includes an iterative training formula. This architecture is used to design and perform the global end-to-end optimization of speech translation, which when compared with conventional methods for speech translation provides not only a learning method with faster convergence but also improves speech translation accuracy.

    摘要翻译: 将语音翻译制定为具有多个特征函数的统一对数线性模型的架构,其中一些特征函数源自生成模型。 该架构采用基于称为增长转型的优化技术的生成特征的辨别性训练。 为语音翻译制定了歧视性的训练目标函数,以及包含迭代训练公式的基于生长变换的模型训练方法。 该架构用于设计和执行语音翻译的全局端到端优化,与传统的语音翻译方法相比,语音翻译不仅提供了一种具有更快融合的学习方法,而且提高了语音翻译的准确性。