Integrative and discriminative technique for spoken utterance translation
    11.
    发明授权
    Integrative and discriminative technique for spoken utterance translation 有权
    口头语言翻译的综合和歧视性技巧

    公开(公告)号:US08407041B2

    公开(公告)日:2013-03-26

    申请号:US12957394

    申请日:2010-12-01

    IPC分类号: G06F17/28

    摘要: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion. The measurable BLEU scores are used to facilitate the implementation of the MCE training procedure in a step that defines the class-specific discriminant function.

    摘要翻译: 提供完整语音翻译系统的自动语音识别(ASR)和机器翻译(MT)组件的集成的架构。 该架构是一种综合和歧视性的方法,采用端到端目标函数(给定源语言的声信号的翻译句子(目标)的条件概率)以及翻译中相关联的BLEU得分作为目标 这个目标定义了理论上正确的变量来确定使用贝叶斯判决规则的语音翻译系统输出,这些理论上正确的变量在实际应用中被修改,这是由于建立全语音翻译系统中使用的各种模型的已知缺陷 所公开的方法还采用最小分类误差(MCE)标准对这些变量进行自动训练,可测量的BLEU分数用于在定义特定类别判别函数的步骤中促进MCE训练过程的实现。

    INTEGRATIVE AND DISCRIMINATIVE TECHNIQUE FOR SPOKEN UTTERANCE TRANSLATION
    12.
    发明申请
    INTEGRATIVE AND DISCRIMINATIVE TECHNIQUE FOR SPOKEN UTTERANCE TRANSLATION 有权
    一体化和辨别技术用于语音翻译

    公开(公告)号:US20120143591A1

    公开(公告)日:2012-06-07

    申请号:US12957394

    申请日:2010-12-01

    IPC分类号: G06F17/28

    摘要: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion. The measurable BLEU scores are used to facilitate the implementation of the MCE training procedure in a step that defines the class-specific discriminant function.

    摘要翻译: 提供完整语音翻译系统的自动语音识别(ASR)和机器翻译(MT)组件的集成的架构。 该架构是一种综合和歧视性的方法,采用端到端目标函数(给定源语言的声信号的翻译句子(目标)的条件概率)以及翻译中相关联的BLEU得分作为目标 这个目标定义了理论上正确的变量来确定使用贝叶斯判决规则的语音翻译系统输出,这些理论上正确的变量在实际应用中被修改,这是由于建立全语音翻译系统中使用的各种模型的已知缺陷 所公开的方法还采用最小分类误差(MCE)标准对这些变量进行自动训练,可测量的BLEU分数用于在定义特定类别判别函数的步骤中促进MCE训练过程的实现。

    Incrementally regulated discriminative margins in MCE training for speech recognition
    13.
    发明授权
    Incrementally regulated discriminative margins in MCE training for speech recognition 有权
    增加对语音识别的MCE训练中的歧视性空白

    公开(公告)号:US07617103B2

    公开(公告)日:2009-11-10

    申请号:US11509980

    申请日:2006-08-25

    IPC分类号: G10L15/14

    CPC分类号: G10L15/063 G10L15/144

    摘要: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the acoustic model. From this score a misclassification measure is calculated and then a loss function is calculated from the misclassification measure. The loss function also includes a margin value that varies over each iteration in the training. Based on the calculated loss function the acoustic model is updated, where the loss function with the margin value is minimized. This process repeats until such time as an empirical convergence is met.

    摘要翻译: 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定声学模型的每个令牌,分数是针对正确的班级和竞赛班分别计算的。 从该分数计算错误分类度量,然后根据误分类度量计算损失函数。 损失函数还包括在训练中每次迭代变化的保证金值。 基于计算的损耗函数,声学模型被更新,其中具有边际值的损失函数被最小化。 该过程重复,直到满足经验收敛的时间为止。

    Speech models generated using competitive training, asymmetric training, and data boosting
    14.
    发明授权
    Speech models generated using competitive training, asymmetric training, and data boosting 有权
    使用竞争性训练,不对称训练和数据提升产生的语音模型

    公开(公告)号:US08532991B2

    公开(公告)日:2013-09-10

    申请号:US12720968

    申请日:2010-03-10

    申请人: Xiaodong He Jian Wu

    发明人: Xiaodong He Jian Wu

    IPC分类号: G10L15/06 G10L15/00

    CPC分类号: G10L15/063

    摘要: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.

    摘要翻译: 使用三种不同的训练系统中的一种或多种来训练语音模型。 它们包括减少识别结果与真实结果之间的距离的竞争性训练,对训练数据进行分组和加权的数据提升以及不同模型组成部分的不对称训练。

    Dependency-based query expansion alteration candidate scoring
    15.
    发明授权
    Dependency-based query expansion alteration candidate scoring 有权
    基于依赖关系的查询扩展更改候选人评分

    公开(公告)号:US08521672B2

    公开(公告)日:2013-08-27

    申请号:US12951068

    申请日:2010-11-22

    CPC分类号: G06F17/30967 G06F17/30672

    摘要: An alteration candidate for a query can be scored. The scoring may include computing one or more query-dependent feature scores and/or one or more intra-candidate dependent feature scores. The computation of the query-dependent feature score(s) can be based on dependencies to multiple query terms from each of one or more alteration terms (i.e., for each of the one or more alteration terms, there can be dependencies to multiple query terms that form at least a portion of the basis for the query-dependent feature score(s)). The computation of the intra-candidate dependent feature score(s) can be based on dependencies between different terms in the alteration candidate. A candidate score can be computed using the query dependent feature score(s) and/or the intra-candidate dependent feature score(s). Additionally, the candidate score can be used in determining whether to select the candidate to expand the query. If selected, the candidate can be used to expand the query.

    摘要翻译: 可以对查询的变更候选进行评分。 评分可以包括计算一个或多个依赖于查询的特征得分和/或一个或多个候选内相关特征得分。 依赖于查询的特征得分的计算可以基于来自一个或多个改变项中的每一个的多个查询词的依赖性(即,对于一个或多个改变术语中的每一个,可以依赖于多个查询术语 其形成用于查询相关特征得分的基础的至少一部分)。 候选者相关特征得分的计算可以基于变更候选者中不同术语之间的依赖关系。 可以使用查询相关特征得分和/或候选内相关特征得分来计算候选分数。 此外,可以使用候选分数来确定是否选择候选来扩展查询。 如果选择,候选人可以用来扩展查询。

    Training parsers to approximately optimize NDCG
    16.
    发明授权
    Training parsers to approximately optimize NDCG 有权
    训练解析器大致优化NDCG

    公开(公告)号:US08473486B2

    公开(公告)日:2013-06-25

    申请号:US12962751

    申请日:2010-12-08

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: A supervised technique uses relevance judgments to train a dependency parser such that it approximately optimizes Normalized Discounted Cumulative Gain (NDCG) in information retrieval. A weighted tree edit distance between the parse tree for a query and the parse tree for a document is added to a ranking function, where the edit distance weights are parameters from the parser. Using parser parameters in the ranking function enables approximate optimization of the parser's parameters for NDCG by adding some constraints to the objective function.

    摘要翻译: 监督技术使用相关性判断来训练依赖性解析器,使得它在信息检索中大致优化归一化折扣累积增益(NDCG)。 用于查询的解析树和文档的解析树之间的加权树编辑距离被添加到排序函数,其中编辑距离权重是来自解析器的参数。 在排序函数中使用解析器参数可以通过向目标函数添加一些约束来近似优化NDCG的解析器参数。

    Confidence threshold tuning
    17.
    发明授权
    Confidence threshold tuning 有权
    置信度调整

    公开(公告)号:US08396715B2

    公开(公告)日:2013-03-12

    申请号:US11168278

    申请日:2005-06-28

    IPC分类号: G10L21/00 G10L15/00

    CPC分类号: G10L15/08

    摘要: An expected dialog-turn (ED) value is estimated for evaluating a speech application. Parameters such as a confidence threshold setting can be adjusted based on the expected dialog-turn value. In a particular example, recognition results and corresponding confidence scores are used to estimate the expected dialog-turn value. The recognition results can be associated with a possible outcome for the speech application and a cost for the possible outcome can be used to estimate the expected dialog-turn value.

    摘要翻译: 估计用于评估语音应用程序的预期对话转弯(ED)值。 可以基于预期的对话转弯值来调整诸如置信阈值设置的参数。 在特定的例子中,使用识别结果和相应的置信度分数来估计预期的对话转弯值。 识别结果可以与语音应用的可能结果相关联,并且可以使用可能结果的成本来估计预期的对话转弯值。

    Word-dependent transition models in HMM based word alignment for statistical machine translation
    18.
    发明授权
    Word-dependent transition models in HMM based word alignment for statistical machine translation 有权
    用于统计机器翻译的基于HMM的词对齐中的词依赖过渡模型

    公开(公告)号:US08060360B2

    公开(公告)日:2011-11-15

    申请号:US11980257

    申请日:2007-10-30

    申请人: Xiaodong He

    发明人: Xiaodong He

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2827

    摘要: A word alignment modeler uses probabilistic learning techniques to train “word-dependent transition models” for use in constructing phrase level Hidden Markov Model (HMM) based word alignment models. As defined herein, “word-dependent transition models” provide a probabilistic model wherein for each source word in training data, a self-transition probability is modeled in combination with a probability of jumping from that particular word to a different word, thereby providing a full transition model for each word in a source phrase. HMM based word alignment models are then used for various word alignment and machine translation tasks. In additional embodiments sparse data problems (i.e., rarely used words) are addressed by using probabilistic learning techniques to estimate word-dependent transition model parameters by maximum a posteriori (MAP) training.

    摘要翻译: 词对齐建模者使用概率学习技术来训练用于构建基于短语级隐马尔可夫模型(HMM)的词对齐模型的“依赖于字的转换模型”。 如本文所定义的,“字相关转换模型”提供概率模型,其中对于训练数据中的每个源词,将自转移概率与从特定单词跳转到不同单词的概率相结合来建模,从而提供 源短语中每个单词的完全转换模型。 然后,基于HMM的字对齐模型用于各种字对齐和机器翻译任务。 在另外的实施例中,稀疏数据问题(即,很少使用的单词)通过使用概率学习技术来通过最大后验(MAP)训练估计单词相关过渡模型参数来解决。

    USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION
    19.
    发明申请
    USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION 审中-公开
    在基于机器的教育中使用组合回答

    公开(公告)号:US20100311030A1

    公开(公告)日:2010-12-09

    申请号:US12477138

    申请日:2009-06-03

    IPC分类号: G09B3/00

    CPC分类号: G09B7/02

    摘要: Described is a technology for learning a foreign language or other subject. Answers (e.g., translations) to questions (e.g., sentences to translate) received from learners are combined into a combined answer that serves as a representative model answer for those learners. The questions also may be provided to machine subsystems to generate machine answers, e.g., machine translators, with those machine answers used in the combined answer. The combined answer is used to evaluate each learner's individual answer. The evaluation may be used to compute profile information that is then fed back for use in selecting further questions, e.g., more difficult sentences as the learners progress. Also described is integrating the platform/technology into a web service.

    摘要翻译: 描述了一种学习外语或其他科目的技术。 将从学习者接收到的问题(例如,翻译)的问题(例如,要翻译的句子)组合成为用于那些学习者的代表性模型答案的组合答案。 也可以将这些问题提供给机器子系统,以便在组合的答案中使用这些机器答案来产生机器答案,例如机器翻译器。 组合的答案用于评估每个学习者的个人答案。 评估可以用于计算简档信息,然后将其反馈用于选择进一步的问题,例如学习者进步时更难的句子。 还描述了将平台/技术集成到Web服务中。

    Word-dependent transition models in HMM based word alignment for statistical machine translation
    20.
    发明申请
    Word-dependent transition models in HMM based word alignment for statistical machine translation 有权
    用于统计机器翻译的基于HMM的词对齐中的词依赖过渡模型

    公开(公告)号:US20090112573A1

    公开(公告)日:2009-04-30

    申请号:US11980257

    申请日:2007-10-30

    申请人: Xiaodong He

    发明人: Xiaodong He

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827

    摘要: A word alignment modeler uses probabilistic learning techniques to train “word-dependent transition models” for use in constructing phrase level Hidden Markov Model (HMM) based word alignment models. As defined herein, “word-dependent transition models” provide a probabilistic model wherein for each source word in training data, a self-transition probability is modeled in combination with a probability of jumping from that particular word to a different word, thereby providing a full transition model for each word in a source phrase. HMM based word alignment models are then used for various word alignment and machine translation tasks. In additional embodiments sparse data problems (i.e., rarely used words) are addressed by using probabilistic learning techniques to estimate word-dependent transition model parameters by maximum a posteriori (MAP) training.

    摘要翻译: 词对齐建模者使用概率学习技术来训练用于构建基于短语级隐马尔可夫模型(HMM)的词对齐模型的“依赖于字的转换模型”。 如本文所定义的,“字相关转换模型”提供概率模型,其中对于训练数据中的每个源词,将自转移概率与从特定单词跳转到不同单词的概率相结合来建模,从而提供 源短语中每个单词的完全转换模型。 然后,基于HMM的字对齐模型用于各种字对齐和机器翻译任务。 在另外的实施例中,稀疏数据问题(即,很少使用的单词)通过使用概率学习技术来通过最大后验(MAP)训练估计单词相关过渡模型参数来解决。