Selection of domain-adapted translation subcorpora
    11.
    发明授权
    Selection of domain-adapted translation subcorpora 有权
    选择领域适应翻译子公司

    公开(公告)号:US08838433B2

    公开(公告)日:2014-09-16

    申请号:US13022633

    申请日:2011-02-08

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2809

    摘要: An architecture is discussed that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.

    摘要翻译: 讨论了一种架构,其提供了从外域语料库中选择最相关的数据的能力,以隔离或与域内数据组合使用。 该架构是机器翻译的域适应,从较大的平行翻译句子的一般领域语料库中选择最相关的句子。 选择数据的方法包括单语交叉熵测度,单语交叉熵差,双语交叉熵和双语交叉熵差。 对域内数据和外域子集进行翻译模型的训练,并将这些模型插值到一起,以提升域内翻译任务的性能。

    HMM ALIGNMENT FOR COMBINING TRANSLATION SYSTEMS
    12.
    发明申请
    HMM ALIGNMENT FOR COMBINING TRANSLATION SYSTEMS 有权
    用于组合翻译系统的HMM对齐

    公开(公告)号:US20090240486A1

    公开(公告)日:2009-09-24

    申请号:US12147807

    申请日:2008-06-27

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827 G06F17/2818

    摘要: A computing system configured to produce an optimized translation hypothesis of text input into the computing system. The computing system includes a plurality of translation machines. Each of the translation machines is configured to produce their own translation hypothesis from the same text. An optimization machine is connected to the plurality of translation machines. The optimization machine is configured to receive the translation hypotheses from the translation machines. The optimization machine is further configured to align, word-to-word, the hypotheses in the plurality of hypotheses by using a hidden Markov model.

    摘要翻译: 一种计算系统,被配置为产生文本输入到所述计算系统中的优化翻译假说。 计算系统包括多个翻译机。 每个翻译机被配置为从相同的文本产生他们自己的翻译假设。 优化机连接到多台翻译机。 优化机被配置为从翻译机接收翻译假说。 优化机还被配置为通过使用隐马尔科夫模型来对齐单词到多个假设中的假设。

    Integrative and discriminative technique for spoken utterance translation
    14.
    发明授权
    Integrative and discriminative technique for spoken utterance translation 有权
    口头语言翻译的综合和歧视性技巧

    公开(公告)号:US08407041B2

    公开(公告)日:2013-03-26

    申请号:US12957394

    申请日:2010-12-01

    IPC分类号: G06F17/28

    摘要: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion. The measurable BLEU scores are used to facilitate the implementation of the MCE training procedure in a step that defines the class-specific discriminant function.

    摘要翻译: 提供完整语音翻译系统的自动语音识别(ASR)和机器翻译(MT)组件的集成的架构。 该架构是一种综合和歧视性的方法,采用端到端目标函数(给定源语言的声信号的翻译句子(目标)的条件概率)以及翻译中相关联的BLEU得分作为目标 这个目标定义了理论上正确的变量来确定使用贝叶斯判决规则的语音翻译系统输出,这些理论上正确的变量在实际应用中被修改,这是由于建立全语音翻译系统中使用的各种模型的已知缺陷 所公开的方法还采用最小分类误差(MCE)标准对这些变量进行自动训练,可测量的BLEU分数用于在定义特定类别判别函数的步骤中促进MCE训练过程的实现。

    INTEGRATIVE AND DISCRIMINATIVE TECHNIQUE FOR SPOKEN UTTERANCE TRANSLATION
    15.
    发明申请
    INTEGRATIVE AND DISCRIMINATIVE TECHNIQUE FOR SPOKEN UTTERANCE TRANSLATION 有权
    一体化和辨别技术用于语音翻译

    公开(公告)号:US20120143591A1

    公开(公告)日:2012-06-07

    申请号:US12957394

    申请日:2010-12-01

    IPC分类号: G06F17/28

    摘要: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion. The measurable BLEU scores are used to facilitate the implementation of the MCE training procedure in a step that defines the class-specific discriminant function.

    摘要翻译: 提供完整语音翻译系统的自动语音识别(ASR)和机器翻译(MT)组件的集成的架构。 该架构是一种综合和歧视性的方法,采用端到端目标函数(给定源语言的声信号的翻译句子(目标)的条件概率)以及翻译中相关联的BLEU得分作为目标 这个目标定义了理论上正确的变量来确定使用贝叶斯判决规则的语音翻译系统输出,这些理论上正确的变量在实际应用中被修改,这是由于建立全语音翻译系统中使用的各种模型的已知缺陷 所公开的方法还采用最小分类误差(MCE)标准对这些变量进行自动训练,可测量的BLEU分数用于在定义特定类别判别函数的步骤中促进MCE训练过程的实现。

    Incrementally regulated discriminative margins in MCE training for speech recognition
    16.
    发明授权
    Incrementally regulated discriminative margins in MCE training for speech recognition 有权
    增加对语音识别的MCE训练中的歧视性空白

    公开(公告)号:US07617103B2

    公开(公告)日:2009-11-10

    申请号:US11509980

    申请日:2006-08-25

    IPC分类号: G10L15/14

    CPC分类号: G10L15/063 G10L15/144

    摘要: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the acoustic model. From this score a misclassification measure is calculated and then a loss function is calculated from the misclassification measure. The loss function also includes a margin value that varies over each iteration in the training. Based on the calculated loss function the acoustic model is updated, where the loss function with the margin value is minimized. This process repeats until such time as an empirical convergence is met.

    摘要翻译: 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定声学模型的每个令牌,分数是针对正确的班级和竞赛班分别计算的。 从该分数计算错误分类度量,然后根据误分类度量计算损失函数。 损失函数还包括在训练中每次迭代变化的保证金值。 基于计算的损耗函数,声学模型被更新,其中具有边际值的损失函数被最小化。 该过程重复,直到满足经验收敛的时间为止。

    Speech models generated using competitive training, asymmetric training, and data boosting
    17.
    发明授权
    Speech models generated using competitive training, asymmetric training, and data boosting 有权
    使用竞争性训练,不对称训练和数据提升产生的语音模型

    公开(公告)号:US08532991B2

    公开(公告)日:2013-09-10

    申请号:US12720968

    申请日:2010-03-10

    申请人: Xiaodong He Jian Wu

    发明人: Xiaodong He Jian Wu

    IPC分类号: G10L15/06 G10L15/00

    CPC分类号: G10L15/063

    摘要: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.

    摘要翻译: 使用三种不同的训练系统中的一种或多种来训练语音模型。 它们包括减少识别结果与真实结果之间的距离的竞争性训练,对训练数据进行分组和加权的数据提升以及不同模型组成部分的不对称训练。

    Confidence threshold tuning
    18.
    发明授权
    Confidence threshold tuning 有权
    置信度调整

    公开(公告)号:US08396715B2

    公开(公告)日:2013-03-12

    申请号:US11168278

    申请日:2005-06-28

    IPC分类号: G10L21/00 G10L15/00

    CPC分类号: G10L15/08

    摘要: An expected dialog-turn (ED) value is estimated for evaluating a speech application. Parameters such as a confidence threshold setting can be adjusted based on the expected dialog-turn value. In a particular example, recognition results and corresponding confidence scores are used to estimate the expected dialog-turn value. The recognition results can be associated with a possible outcome for the speech application and a cost for the possible outcome can be used to estimate the expected dialog-turn value.

    摘要翻译: 估计用于评估语音应用程序的预期对话转弯(ED)值。 可以基于预期的对话转弯值来调整诸如置信阈值设置的参数。 在特定的例子中,使用识别结果和相应的置信度分数来估计预期的对话转弯值。 识别结果可以与语音应用的可能结果相关联,并且可以使用可能结果的成本来估计预期的对话转弯值。

    Word-dependent transition models in HMM based word alignment for statistical machine translation
    19.
    发明授权
    Word-dependent transition models in HMM based word alignment for statistical machine translation 有权
    用于统计机器翻译的基于HMM的词对齐中的词依赖过渡模型

    公开(公告)号:US08060360B2

    公开(公告)日:2011-11-15

    申请号:US11980257

    申请日:2007-10-30

    申请人: Xiaodong He

    发明人: Xiaodong He

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2827

    摘要: A word alignment modeler uses probabilistic learning techniques to train “word-dependent transition models” for use in constructing phrase level Hidden Markov Model (HMM) based word alignment models. As defined herein, “word-dependent transition models” provide a probabilistic model wherein for each source word in training data, a self-transition probability is modeled in combination with a probability of jumping from that particular word to a different word, thereby providing a full transition model for each word in a source phrase. HMM based word alignment models are then used for various word alignment and machine translation tasks. In additional embodiments sparse data problems (i.e., rarely used words) are addressed by using probabilistic learning techniques to estimate word-dependent transition model parameters by maximum a posteriori (MAP) training.

    摘要翻译: 词对齐建模者使用概率学习技术来训练用于构建基于短语级隐马尔可夫模型(HMM)的词对齐模型的“依赖于字的转换模型”。 如本文所定义的,“字相关转换模型”提供概率模型,其中对于训练数据中的每个源词,将自转移概率与从特定单词跳转到不同单词的概率相结合来建模,从而提供 源短语中每个单词的完全转换模型。 然后,基于HMM的字对齐模型用于各种字对齐和机器翻译任务。 在另外的实施例中,稀疏数据问题(即,很少使用的单词)通过使用概率学习技术来通过最大后验(MAP)训练估计单词相关过渡模型参数来解决。

    USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION
    20.
    发明申请
    USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION 审中-公开
    在基于机器的教育中使用组合回答

    公开(公告)号:US20100311030A1

    公开(公告)日:2010-12-09

    申请号:US12477138

    申请日:2009-06-03

    IPC分类号: G09B3/00

    CPC分类号: G09B7/02

    摘要: Described is a technology for learning a foreign language or other subject. Answers (e.g., translations) to questions (e.g., sentences to translate) received from learners are combined into a combined answer that serves as a representative model answer for those learners. The questions also may be provided to machine subsystems to generate machine answers, e.g., machine translators, with those machine answers used in the combined answer. The combined answer is used to evaluate each learner's individual answer. The evaluation may be used to compute profile information that is then fed back for use in selecting further questions, e.g., more difficult sentences as the learners progress. Also described is integrating the platform/technology into a web service.

    摘要翻译: 描述了一种学习外语或其他科目的技术。 将从学习者接收到的问题(例如,翻译)的问题(例如,要翻译的句子)组合成为用于那些学习者的代表性模型答案的组合答案。 也可以将这些问题提供给机器子系统,以便在组合的答案中使用这些机器答案来产生机器答案,例如机器翻译器。 组合的答案用于评估每个学习者的个人答案。 评估可以用于计算简档信息,然后将其反馈用于选择进一步的问题,例如学习者进步时更难的句子。 还描述了将平台/技术集成到Web服务中。