WORD BREAKER FROM CROSS-LINGUAL PHRASE TABLE
    3.
    发明申请
    WORD BREAKER FROM CROSS-LINGUAL PHRASE TABLE 审中-公开
    WORD BREAKER从十字手表

    公开(公告)号:WO2014168899A3

    公开(公告)日:2015-04-09

    申请号:PCT/US2014033242

    申请日:2014-04-08

    Applicant: MICROSOFT CORP

    CPC classification number: G06F17/2755 G06F17/2818 G06F17/2827

    Abstract: Automatically creating word breakers which segment words into morphemes is described, for example, to improve information retrieval, machine translation or speech systems. In embodiments a cross-lingual phrase table, comprising source language (such as Turkish) phrases and potential translations in a target language (such as English) with associated probabilities, is available. In various examples, blocks of source language phrases from the phrase table are created which have similar target language translations. In various examples, inference using the target language translations in a block enables stem and affix combinations to be found for source language words without the need for input from human-judges or prior knowledge of source language linguistic rules or a source language lexicon.

    Abstract translation: 例如,为了改进信息检索,机器翻译或语音系统,自动创建将词划分成语素的分词器。 在实施例中,包括源语言(例如土耳其语)短语和具有相关概率的目标语言(例如英语)中的潜在翻译的跨语言词表是可用的。 在各种示例中,创建具有类似目标语言翻译的来自短语表的源语言短语的块。 在各种示例中,使用块中的目标语言翻译的推论使得能够针对源语言词找到干和组合组合,而不需要来自人类判断的输入或来源语言语言规则或源语言词典的先验知识。

    LANGUAGE MODEL TRAINED USING PREDICTED QUERIES FROM STATISTICAL MACHINE TRANSLATION
    4.
    发明申请
    LANGUAGE MODEL TRAINED USING PREDICTED QUERIES FROM STATISTICAL MACHINE TRANSLATION 审中-公开
    使用来自统计机器翻译的预测性问题的语言模型

    公开(公告)号:WO2014190220A2

    公开(公告)日:2014-11-27

    申请号:PCT/US2014/039258

    申请日:2014-05-23

    Abstract: A Statistical Machine Translation (SMT) model is trained using pairs of sentences that include content obtained from one or more content sources (e.g. feed(s)) with corresponding queries that have been used to access the content. A query click graph may be used to assist in determining candidate pairs for the SMT training data. All/portion of the candidate pairs may be used to train the SMT model. After training the SMT model using the SMT training data, the SMT model is applied to content to determine predicted queries that may be used to search for the content. The predicted queries are used to train a language model, such as a query language model. The query language model may be interpolated other language models, such as a background language model, as well as a feed language model trained using the content used in determining the predicted queries.

    Abstract translation: 使用成对的句子来训练统计机器翻译(SMT)模型,所述句子包括从一个或多个内容源(例如,馈送)获得的内容与已经用于访问内容的对应查询。 可以使用查询点击图来帮助确定SMT训练数据的候选对。 候选对的全部/部分可用于训练SMT模型。 在使用SMT培训数据对SMT模型进行培训后,将SMT模型应用于内容,以确定可能用于搜索内容的预测查询。 预测的查询用于训练语言模型,如查询语言模型。 查询语言模型可以内插其他语言模型,例如背景语言模型,以及使用在确定预测查询中使用的内容训练的馈送语言模型。

    SYSTEM AND METHOD OF MACHINE TRANSLATION
    5.
    发明申请
    SYSTEM AND METHOD OF MACHINE TRANSLATION 审中-公开
    机器翻译系统与方法

    公开(公告)号:WO2014108208A1

    公开(公告)日:2014-07-17

    申请号:PCT/EP2013/050521

    申请日:2013-01-11

    CPC classification number: G06F17/289 G06F17/2818

    Abstract: A machine translation system (1) comprises a language analysis module (3) which receives an unknown text (4) and analyses portions of the unknown text (4). The language analysis module (3) identifies language features in the unknown text (4) and provides the linguistic fingerprint to a translation configuration selection module (5). The translation configuration selection module (5) selects translation configurations (T-T9) from a memory (6) which corresponds with the identified linguistic fingerprints and communicates the selected language configurations (T-T9)to a machine translation module (7). The machine translation module (7) translates the unknown text (4) into a different language using the selected translation configurations (T-T9).

    Abstract translation: 机器翻译系统(1)包括语言分析模块(3),其接收未知文本(4)并分析未知文本的部分(4)。 语言分析模块(3)识别未知文本(4)中的语言特征,并向翻译配置选择模块(5)提供语言指纹。 翻译配置选择模块(5)从与所识别的语言指纹对应的存储器(6)中选择翻译配置(T-T9),并将选择的语言配置(T-T9)传送到机器翻译模块(7)。 机器翻译模块(7)使用所选择的翻译配置(T-T9)将未知文本(4)转换成不同的语言。

    MINING PHRASE PAIRS FROM AN UNSTRUCTURED RESOURCE
    7.
    发明申请
    MINING PHRASE PAIRS FROM AN UNSTRUCTURED RESOURCE 审中-公开
    从没有资助的资源中采集相应的配对

    公开(公告)号:WO2010135204A2

    公开(公告)日:2010-11-25

    申请号:PCT/US2010/035033

    申请日:2010-05-14

    CPC classification number: G06F17/2818 G06F17/2845

    Abstract: A mining system applies queries to retrieve result items from an unstructured resource. The unstructured resource may correspond to a repository of network-accessible resource items. The result items that are retrieved may correspond to text segments (e.g., sentence fragments) associated with resource items. The mining system produces a structured training set by filtering the result items and establishing respective pairs of result items. A training system can use the training set to produce a statistical translation model. The translation model can be used in a monolingual context to translate between semantically-related phrases in a single language. The translation model can also be used in a bilingual context to translate between phrases expressed in two respective languages. Various applications of the translation model are also described.

    Abstract translation: 挖掘系统应用查询从非结构化资源中检索结果项。 非结构化资源可以对应于网络可访问的资源项目的存储库。 检索的结果项目可以对应于与资源项目相关联的文本段(例如,句子片段)。 采矿系统通过过滤结果项目并建立相应的成果项目来生成结构化训练集。 培训系统可以使用训练集来产生统计翻译模型。 翻译模型可以用于单语上下文中,以单一语言在语义相关的短语之间进行翻译。 翻译模型也可用于双语语境中,以两种语言表达的短语之间进行翻译。 还描述了翻译模型的各种应用。

    OPTIMIZING PARAMETERS FOR MACHINE TRANSLATION
    8.
    发明申请
    OPTIMIZING PARAMETERS FOR MACHINE TRANSLATION 审中-公开
    机器翻译优化参数

    公开(公告)号:WO2010003117A2

    公开(公告)日:2010-01-07

    申请号:PCT/US2009/049613

    申请日:2009-07-02

    CPC classification number: G06F17/2818

    Abstract: Methods, systems, and apparatus, including computer program products, for language translation are disclosed. In one implementation, a method is provided. The method includes accessing a hypothesis space; performing decoding on the hypothesis space to obtain a translation hypothesis that minimizes an expected error in classification calculated relative to an evidence space; and providing the obtained translation hypothesis for use by a user as a suggested translation in a target translation.

    Abstract translation: 公开了用于语言翻译的方法,系统和装置,包括计算机程序产品。 在一个实现中,提供了一种方法。 该方法包括访问假设空间; 对假设空间进行解码以获得最小化相对于证据空间计算的分类中的期望误差的翻译假设; 并提供所获得的翻译假设供用户使用,作为目标翻译中的建议翻译。

    LARGE LANGUAGE MODELS IN MACHINE TRANSLATION
    9.
    发明申请
    LARGE LANGUAGE MODELS IN MACHINE TRANSLATION 审中-公开
    机器翻译中的大量语言模型

    公开(公告)号:WO2008118905A3

    公开(公告)日:2009-02-12

    申请号:PCT/US2008058116

    申请日:2008-03-25

    CPC classification number: G06F17/2818 G06F17/2827 G06F17/2845

    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    Abstract translation: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    TRAINING TREE TRANSDUCERS
    10.
    发明申请
    TRAINING TREE TRANSDUCERS 审中-公开
    培训树传感器

    公开(公告)号:WO2005089340A3

    公开(公告)日:2008-01-10

    申请号:PCT/US2005008648

    申请日:2005-03-15

    CPC classification number: G06F17/2775 G06F17/2818

    Abstract: Training using tree transducers is described. Given sample input/output pairs as training (100, 110), and given a set of tree transducer rules (120), the information is combined to yield locally optimal weights for those rules (140). This combination is carried out by building a weighted derivation forest for each input/output pair and applying counting methods to those forests (130).

    Abstract translation: 描述使用树形传感器的训练。 给定样本输入/输出对作为训练(100,110),并且给定一组树传感器规则(120),将该信息组合以产生用于那些规则的局部最优权重(140)。 这种组合是通过为每个输入/输出对构建加权推导森林,并对这些森林应用计数方法来执行的(130)。

Patent Agency Ranking