Technique for document editorial quality assessment
    4.
    发明申请
    Technique for document editorial quality assessment 有权
    文件编辑质量评估技术

    公开(公告)号:US20060100852A1

    公开(公告)日:2006-05-11

    申请号:US10969119

    申请日:2004-10-20

    IPC分类号: G06F17/27

    CPC分类号: G06F17/271 G06F17/2785

    摘要: A computer-implemented system and method for assessing the editorial quality of a textual unit (document, paragraph or sentence) is provided. The method includes generating a plurality of training-time feature vectors by automatically extracting features from first and last versions of training documents. The method also includes training a machine-learned classifier based on the plurality of training-time feature vectors. A run-time feature vector is generated for the textual unit to be assessed by automatically extracting features from the textual unit. The run-time feature vector is evaluated using the machine-learned classifier to provide an assessment of the editorial quality of the textual unit.

    摘要翻译: 提供了一种用于评估文本单元(文档,段落或句子)的编辑质量的计算机实现的系统和方法。 该方法包括通过自动提取来自训练文档的第一和最后版本的特征来生成多个训练时特征向量。 该方法还包括基于多个训练时间特征向量训练机器学习分类器。 通过自动从文本单元中提取特征,为要评估的文本单元生成运行时特征向量。 运行时特征向量使用机器学习分类器进行评估,以提供对文本单元的编辑质量的评估。

    MACHINE TRANSLATION DETECTION IN WEB-SCRAPED PARALLEL CORPORA
    5.
    发明申请
    MACHINE TRANSLATION DETECTION IN WEB-SCRAPED PARALLEL CORPORA 审中-公开
    网络平铺公司的机器翻译检测

    公开(公告)号:US20130103695A1

    公开(公告)日:2013-04-25

    申请号:US13278194

    申请日:2011-10-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2854

    摘要: Various technologies described herein pertain to detecting machine translated content. Documents in a document pair are mutual lingual translations of each other. Further, document level features of the documents in the document pair can be identified. The document level features can correlate with translation quality between the documents in the document pair. Moreover, statistical classification can be used to detect whether the document pair is generated through machine translation based at least in part upon the document level features. Further, a first document can be a machine translation of a second document in the document pair or a disparate document when generated through machine translation.

    摘要翻译: 本文描述的各种技术涉及检测机器翻译的内容。 文件对中的文件是相互互相翻译的。 此外,可以识别文档对中的文档的文档级别特征。 文档级别的特征可以与文档对中的文档之间的翻译质量相关联。 此外,可以使用统计分类来至少部分地基于文档级别特征来检测文档对是否通过机器翻译生成。 此外,当通过机器翻译生成时,第一文档可以是文档对中的第二文档的机器翻译或不同文档的机器翻译。

    Identifying language translations for source documents using links
    6.
    发明授权
    Identifying language translations for source documents using links 有权
    使用链接识别源文档的语言翻译

    公开(公告)号:US08271869B2

    公开(公告)日:2012-09-18

    申请号:US12900490

    申请日:2010-10-08

    申请人: Anthony Aue

    发明人: Anthony Aue

    IPC分类号: G06F17/28 G06F17/30

    CPC分类号: G06F17/2827 G06F17/275

    摘要: Technology is described for identifying language translations for source documents. The method includes finding source documents containing links to target documents and the link anchors of the links have language indicating text. A first tuple set can be generated for paired source documents and target documents with an expected target language for a target document. The first tuple set can be annotated with primary languages for the source documents and target documents to form a second tuple set where primary languages of the source documents and target documents are different. Further, a third tuple set can be generated using the second tuple set using a count of the number of times source documents and target documents occur in the first tuple set. Tuples can be removed from the third tuple set where a count ratio between source document count and target document count is less than a reference ratio.

    摘要翻译: 描述了用于识别源文档的语言翻译的技术。 该方法包括找到包含目标文档链接的源文档,并且链接的链接锚具有指示文本的语言。 可以为配对的源文档和目标文档生成具有目标文档的预期目标语言的第一个元组。 第一个元组可以用源文档和目标文档的主要语言进行注释,以形成源文档和目标文档的主要语言不同的第二个元组。 此外,可以使用在第一元组中发生源文档和目标文档的次数的计数,使用第二元组来生成第三元组。 可以从第三个元组中删除元组,其中源文档计数和目标文档计数之间的计数比率小于参考比。

    LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS
    8.
    发明申请
    LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS 有权
    多媒体语言语言分段

    公开(公告)号:US20140067365A1

    公开(公告)日:2014-03-06

    申请号:US14073036

    申请日:2013-11-06

    申请人: Anthony Aue

    发明人: Anthony Aue

    IPC分类号: G06F17/28

    CPC分类号: G06F17/289 G06F17/275

    摘要: The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.

    摘要翻译: 所要求保护的主题提供用于分割多语言文本的系统和/或方法。 一种示例性方法包括确定多语言文本中的句子的初始概率分布,初始概率分布指示每个句子在一组语言中的每一个中的可能性。 可以基于初始概率分布来学习跨越句子的语言转换的概率。 另外,可以基于语言转换的概率和由初始模型提供的先验概率分布的组合来确定多语言文本中句子的最高概率语言序列。 此外,web文档以句子级别的句子注释,使得根据所确定的最高概率语言,以特定语言标记web文档的每个句子。

    LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS
    9.
    发明申请
    LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS 有权
    多媒体语言语言分段

    公开(公告)号:US20120203540A1

    公开(公告)日:2012-08-09

    申请号:US13022630

    申请日:2011-02-08

    申请人: Anthony Aue

    发明人: Anthony Aue

    IPC分类号: G06F17/20

    CPC分类号: G06F17/289 G06F17/275

    摘要: The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model.

    摘要翻译: 所要求保护的主题提供用于分割多语言文本的系统和/或方法。 一种示例性方法包括确定多语言文本中的句子的初始概率分布,初始概率分布指示每个句子在一组语言中的每一个中的可能性。 可以基于初始概率分布来学习跨越句子的语言转换的概率。 另外,可以基于语言转换的概率和由初始模型提供的先验概率分布的组合来确定多语言文本中句子的最高概率语言序列。

    Identifying Language Translations For Source Documents using Links
    10.
    发明申请
    Identifying Language Translations For Source Documents using Links 有权
    使用链接识别源文档的语言翻译

    公开(公告)号:US20120089898A1

    公开(公告)日:2012-04-12

    申请号:US12900490

    申请日:2010-10-08

    申请人: Anthony Aue

    发明人: Anthony Aue

    IPC分类号: G06F17/00

    CPC分类号: G06F17/2827 G06F17/275

    摘要: Technology is described for identifying language translations for source documents. The method includes finding source documents containing links to target documents and the link anchors of the links have language indicating text. A first tuple set can be generated for paired source documents and target documents with an expected target language for a target document. The first tuple set can be annotated with primary languages for the source documents and target documents to form a second tuple set where primary languages of the source documents and target documents are different. Further, a third tuple set can be generated using the second tuple set using a count of the number of times source documents and target documents occur in the first tuple set. Tuples can be removed from the third tuple set where a count ratio between source document count and target document count is less than a reference ratio.

    摘要翻译: 描述了用于识别源文档的语言翻译的技术。 该方法包括找到包含目标文档链接的源文档,并且链接的链接锚具有指示文本的语言。 可以为配对的源文档和目标文档生成具有目标文档的预期目标语言的第一个元组。 第一个元组可以用源文档和目标文档的主要语言进行注释,以形成源文档和目标文档的主要语言不同的第二个元组。 此外,可以使用在第一元组中发生源文档和目标文档的次数的计数,使用第二元组来生成第三元组。 可以从第三个元组中删除元组,其中源文档计数和目标文档计数之间的计数比率小于参考比。