SELECTION OF DOMAIN-ADAPTED TRANSLATION SUBCORPORA
    1.
    发明申请
    SELECTION OF DOMAIN-ADAPTED TRANSLATION SUBCORPORA 有权
    选择域适应翻译SUBCORPORA

    公开(公告)号:US20120203539A1

    公开(公告)日:2012-08-09

    申请号:US13022633

    申请日:2011-02-08

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2809

    摘要: Architecture that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.

    摘要翻译: 架构提供了从外域语料库中选择最相关的数据的能力,以隔离或与域内数据组合使用。 该架构是机器翻译的域适应,从较大的平行翻译句子的一般领域语料库中选择最相关的句子。 选择数据的方法包括单语交叉熵测度,单语交叉熵差,双语交叉熵和双语交叉熵差。 对域内数据和外域子集进行翻译模型的训练,并将这些模型插值到一起,以提升域内翻译任务的性能。

    Selection of domain-adapted translation subcorpora
    2.
    发明授权
    Selection of domain-adapted translation subcorpora 有权
    选择领域适应翻译子公司

    公开(公告)号:US08838433B2

    公开(公告)日:2014-09-16

    申请号:US13022633

    申请日:2011-02-08

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2809

    摘要: An architecture is discussed that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.

    摘要翻译: 讨论了一种架构,其提供了从外域语料库中选择最相关的数据的能力,以隔离或与域内数据组合使用。 该架构是机器翻译的域适应,从较大的平行翻译句子的一般领域语料库中选择最相关的句子。 选择数据的方法包括单语交叉熵测度,单语交叉熵差,双语交叉熵和双语交叉熵差。 对域内数据和外域子集进行翻译模型的训练,并将这些模型插值到一起,以提升域内翻译任务的性能。