Language model using reverse translations

    公开(公告)号:US10460040B2

    公开(公告)日:2019-10-29

    申请号:US15194249

    申请日:2016-06-27

    Applicant: Facebook, Inc.

    Abstract: Exemplary embodiments relate to techniques for improving machine translation systems. The machine translation system may apply one or more models for translating material from a source language into a destination language. The models are initially trained using training data. According to exemplary embodiments, supplemental training data is used to train the models, where the supplemental training data uses in-domain material to improve the quality of output translations. In-domain data may include data that relates to the same or similar topics as those expected to be encountered in a translation of material from the source language into the destination language. In-domain data may include material previously translated from the source language into the destination language, material similar to previous translations, and destination language material that has previously been the subject of a request for translation into the source language.

    Identifying risky translations
    3.
    发明授权

    公开(公告)号:US10318640B2

    公开(公告)日:2019-06-11

    申请号:US15192076

    申请日:2016-06-24

    Applicant: Facebook, Inc.

    Abstract: Exemplary embodiments provide techniques for evaluating when words or phrases of a translation were generated with a low degree of confidence, and conveying this information when the translation is presented. For example, if a source language word is encountered in source material for translation, but the source language word was only encountered a few times (or not at all) in the training data used to train the translation system, then the resulting translation may be flagged as being of low confidence. Other situations, such as the generation of two equally-likely translations, or translation system model disagreement, may also indicate a questionable translation. When the translation is displayed, questionable words and phrases may be flagged, and possible alternative translations may be presented. If one of the alternatives is selected, this information may be used to update the translation system's models in order to improve translation quality in the future.

    MACHINE-TRANSLATION BASED CORRECTIONS
    4.
    发明申请

    公开(公告)号:US20190018837A1

    公开(公告)日:2019-01-17

    申请号:US15868970

    申请日:2018-01-11

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/2775 G06F17/273

    Abstract: Technology is disclosed for building correction models that correct natural language snippets. Correction models can include rules comprising pairs of word sequences identified from viable correction snippet pairs, where a first sequence of words in the pair should be replaced with a second sequence of words in the pair. Viable correction snippet pairs can be identified from among pairs of language snippets, such as a post to a social media website and a subsequent update to that post. Viable corrections can be the snippet pairs that both have no more unaligned words than a word alignment threshold and have no aligned word pair with a character edit difference above an edit distance threshold. In some implementations, word alignments can be found by aligning all the characters between a pair of language snippets, and identifying aligned words as those that have at least one aligned letter in common.

    Machine-translation based corrections

    公开(公告)号:US09904672B2

    公开(公告)日:2018-02-27

    申请号:US14788679

    申请日:2015-06-30

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/2775 G06F17/273

    Abstract: Technology is disclosed for building correction models that correct natural language snippets. Correction models can include rules comprising pairs of word sequences identified from viable correction snippet pairs, where a first sequence of words in the pair should be replaced with a second sequence of words in the pair. Viable correction snippet pairs can be identified from among pairs of language snippets, such as a post to a social media website and a subsequent update to that post. Viable corrections can be the snippet pairs that both have no more unaligned words than a word alignment threshold and have no aligned word pair with a character edit difference above an edit distance threshold. In some implementations, word alignments can be found by aligning all the characters between a pair of language snippets, and identifying aligned words as those that have at least one aligned letter in common.

    CORRECTIONS FOR NATURAL LANGUAGE PROCESSING
    6.
    发明申请
    CORRECTIONS FOR NATURAL LANGUAGE PROCESSING 审中-公开
    自然语言处理的修正

    公开(公告)号:US20170004120A1

    公开(公告)日:2017-01-05

    申请号:US14788578

    申请日:2015-06-30

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/2775 G06F17/273

    Abstract: Technology is disclosed for correcting items containing natural language words that match qualified corrections. Qualified corrections can be identified from language snippet sets, which can include, for example, a post to a social media website and one or more updates to that post. Qualified corrections can be word pairs identified in one of these language snippet sets by aligning words between the language snippets according to a minimum word edit distance and computing that the word edit distance is below a first threshold. Based on this word alignment, word pairs can be selected and analyzed to identify qualified corrections as the word pairs that have a minimum character edit distance below a second threshold. In some cases, such as where both words in the qualified correction word pair are known words, a context can be associated with the qualified correction to control when the qualified correction should be applied.

    Abstract translation: 公开了用于校正包含符合合格更正的自然语言单词的项目的技术。 可以从语言片段集中识别合格的更正,例如,可以将社交媒体网站的帖子和该帖子的一个或多个更新。 通过根据最小单词编辑距离对准语言片段之间的单词并计算单词编辑距离低于第一阈值,可以通过这些语言片段集合之一识别的合格校正。 基于该字对齐,可以选择和分析字对以将合格的校正识别为具有低于第二阈值的最小字符编辑距离的字对。 在某些情况下,例如在合格校正字对中的两个字都是已知字的情况下,上下文可以与合格校正相关联,以便在应用合格校正时进行控制。

    Machine-translation based corrections

    公开(公告)号:US10474751B2

    公开(公告)日:2019-11-12

    申请号:US15868970

    申请日:2018-01-11

    Applicant: Facebook, Inc.

    Abstract: Technology is disclosed for building correction models that correct natural language snippets. Correction models can include rules comprising pairs of word sequences identified from viable correction snippet pairs, where a first sequence of words in the pair should be replaced with a second sequence of words in the pair. Viable correction snippet pairs can be identified from among pairs of language snippets, such as a post to a social media website and a subsequent update to that post. Viable corrections can be the snippet pairs that both have no more unaligned words than a word alignment threshold and have no aligned word pair with a character edit difference above an edit distance threshold. In some implementations, word alignments can be found by aligning all the characters between a pair of language snippets, and identifying aligned words as those that have at least one aligned letter in common.

    Machine translation system employing classifier

    公开(公告)号:US10268686B2

    公开(公告)日:2019-04-23

    申请号:US15192170

    申请日:2016-06-24

    Applicant: Facebook, Inc.

    Abstract: Exemplary embodiments relate to detecting, removing, and/or replacing objectionable words and phrases in a machine-generated translation. A classifier identifies translations containing target words or phrases. The classifier may be applied to the output translation to remove target words and phrases from the translation, or to prevent target words and phrases from being automatically presented. Further, the classifier may be applied to a translation model to prevent the target words and phrases from appearing in the output translation. Still further, the classifier may be applied to training data so that the translation model is not trained using the target words of phrases. The classifier may remove target words or phrases only when the target words or phrases appear in the output translation but not the source language input data. The classifier may be provided as a standalone service, or may be employed in the context of a machine translation system.

    MINING MULTI-LINGUAL DATA
    9.
    发明申请

    公开(公告)号:US20180089178A1

    公开(公告)日:2018-03-29

    申请号:US15823492

    申请日:2017-11-27

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/289 G06F16/951 G06F17/2818 G06F17/2827

    Abstract: Technology is disclosed for mining training data to create machine translation engines. Training data can be mined as translation pairs from single content items that contain multiple languages; multiple content items in different languages that are related to the same or similar target; or multiple content items that are generated by the same author in different languages. Locating content items can include identifying potential sources of translation pairs that fall into these categories and applying filtering techniques to quickly gather those that are good candidates for being actual translation pairs. When actual translation pairs are located, they can be used to retrain a machine translation engine as in-domain for social media content items.

    IDENTIFYING RISKY TRANSLATIONS
    10.
    发明申请

    公开(公告)号:US20170371867A1

    公开(公告)日:2017-12-28

    申请号:US15192076

    申请日:2016-06-24

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/2854 G06F17/2818

    Abstract: Exemplary embodiments provide techniques for evaluating when words or phrases of a translation were generated with a low degree of confidence, and conveying this information when the translation is presented. For example, if a source language word is encountered in source material for translation, but the source language word was only encountered a few times (or not at all) in the training data used to train the translation system, then the resulting translation may be flagged as being of low confidence. Other situations, such as the generation of two equally-likely translations, or translation system model disagreement, may also indicate a questionable translation. When the translation is displayed, questionable words and phrases may be flagged, and possible alternative translations may be presented. If one of the alternatives is selected, this information may be used to update the translation system's models in order to improve translation quality in the future.

Patent Agency Ranking