MACHINE TRANSLATION DETECTION IN WEB-SCRAPED PARALLEL CORPORA
    1.
    发明申请
    MACHINE TRANSLATION DETECTION IN WEB-SCRAPED PARALLEL CORPORA 审中-公开
    网络平铺公司的机器翻译检测

    公开(公告)号:US20130103695A1

    公开(公告)日:2013-04-25

    申请号:US13278194

    申请日:2011-10-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2854

    摘要: Various technologies described herein pertain to detecting machine translated content. Documents in a document pair are mutual lingual translations of each other. Further, document level features of the documents in the document pair can be identified. The document level features can correlate with translation quality between the documents in the document pair. Moreover, statistical classification can be used to detect whether the document pair is generated through machine translation based at least in part upon the document level features. Further, a first document can be a machine translation of a second document in the document pair or a disparate document when generated through machine translation.

    摘要翻译: 本文描述的各种技术涉及检测机器翻译的内容。 文件对中的文件是相互互相翻译的。 此外,可以识别文档对中的文档的文档级别特征。 文档级别的特征可以与文档对中的文档之间的翻译质量相关联。 此外,可以使用统计分类来至少部分地基于文档级别特征来检测文档对是否通过机器翻译生成。 此外,当通过机器翻译生成时,第一文档可以是文档对中的第二文档的机器翻译或不同文档的机器翻译。

    Faster minimum error rate training for weighted linear models
    2.
    发明授权
    Faster minimum error rate training for weighted linear models 有权
    加权线性模型更快的最小误差率训练

    公开(公告)号:US09098812B2

    公开(公告)日:2015-08-04

    申请号:US12423187

    申请日:2009-04-14

    CPC分类号: G06N99/005 G06N5/003 G06N5/04

    摘要: The claimed subject matter provides systems and/or methods for training feature weights in a statistical machine translation model. The system can include components that obtain lists of translation hypotheses and associated feature values, set a current point in the multidimensional feature weight space to an initial value, chooses a line in the feature weight space that passes through the current point, and resets the current point to optimize the feature weights with respect to the line. The system can further include components that set the current point to be a best point attained, reduce the list of translation hypotheses based on a determination that a particular hypothesis has never been touched in optimizing the feature weights from at least one of an initial staring point or a randomly selected restarting point, and output the point ascertained to be the best point in the feature weight space.

    摘要翻译: 所要求保护的主题提供用于在统计机器翻译模型中训练特征权重的系统和/或方法。 该系统可以包括获得翻译假设和相关特征值的列表的组件,将多维特征权重空间中的当前点设置为初始值,在通过当前点的特征权重空间中选择一行,并重置当前 指向相对于线路优化特征权重。 该系统可以进一步包括将当前点设定为获得的最佳点的组件,基于在从初始凝视点中的至少一个优化特征权重时从未触及特定假设的确定来减少翻译假设列表 或随机选择的重新启动点,并且将确定的点输出为特征权重空间中的最佳点。

    Statistical Machine Translation Based Search Query Spelling Correction
    3.
    发明申请
    Statistical Machine Translation Based Search Query Spelling Correction 审中-公开
    基于统计机器翻译的搜索查询拼写更正

    公开(公告)号:US20130124492A1

    公开(公告)日:2013-05-16

    申请号:US13296640

    申请日:2011-11-15

    IPC分类号: G06F17/30

    摘要: Statistical Machine Translation (SMT) based search query spelling correction techniques are described herein. In one or more implementations, search data regarding searches performed by clients may be logged. The logged data includes query correction pairs that may be used to ascertain error patterns indicating how misspelled substrings may be translated to corrected substrings. The error patterns may be used to determine suggestions for an input query and to develop query correction models used to translate the input query to a corrected query. In one or more implementations, probabilistic features from multiple query correction models are combined to score different correction candidates. One or more top scoring correction candidates may then be exposed as suggestions for selection by a user and/or provided to a search engine to conduct a corresponding search using the corrected query version(s).

    摘要翻译: 本文描述了基于统计机器翻译(SMT)的搜索查询拼写校正技术。 在一个或多个实现中,可以记录关于由客户端执行的搜索的搜索数据。 记录的数据包括可用于确定错误模式的查询校正对,指示拼写错误的子字符串可以被翻译为校正子字符串。 错误模式可用于确定输入查询的建议,并开发用于将输入查询转换为更正查询的查询校正模型。 在一个或多个实现中,来自多个查询校正模型的概率特征被组合以得出不同的校正候选。 然后可以将一个或多个顶级评分校正候选者作为用户的选择和/或提供给搜索引擎的建议被公开,以使用校正的查询版本进行相应的搜索。

    LOCATING PARALLEL WORD SEQUENCES IN ELECTRONIC DOCUMENTS
    5.
    发明申请
    LOCATING PARALLEL WORD SEQUENCES IN ELECTRONIC DOCUMENTS 有权
    在电子文件中定位并行词汇序列

    公开(公告)号:US20110301935A1

    公开(公告)日:2011-12-08

    申请号:US12794778

    申请日:2010-06-07

    IPC分类号: G06F17/28 G06F17/27

    CPC分类号: G06F17/2827 G06F17/278

    摘要: Systems and methods for automatically extracting parallel word sequences from comparable corpora are described. Electronic documents, such as web pages belonging to a collaborative online encyclopedia, are analyzed to locate parallel word sequences between electronic documents written in different languages. These parallel word sequences are then used to train a machine translation system that can translate text from one language to another.

    摘要翻译: 描述了从可比较的语料库自动提取并行字序列的系统和方法。 分析电子文档,例如属于协作式在线百科全书的网页,以在以不同语言编写的电子文档之间定位并行字序列。 然后,这些并行字序列用于训练可以将文本从一种语言翻译成另一种语言的机器翻译系统。

    Universal text input
    6.
    发明授权
    Universal text input 有权
    通用文本输入

    公开(公告)号:US08738356B2

    公开(公告)日:2014-05-27

    申请号:US13110484

    申请日:2011-05-18

    IPC分类号: G06F17/28

    CPC分类号: G06F17/27

    摘要: The universal text input technique described herein addresses the difficulties of typing text in various languages and scripts, and offers a unified solution, which combines character conversion, next word prediction, spelling correction and automatic script switching to make it extremely simple to type any language from any device. The technique provides a rich and seamless input experience in any language through a universal IME (input method editor). It allows a user to type in any script for any language using a regular qwerty keyboard via phonetic input and at the same time allows for auto-completion and spelling correction of words and phrases while typing. The technique also provides a modeless input that automatically turns on and off an input mode that changes between different types of script.

    摘要翻译: 本文描述的通用文本输入技术解决了以各种语言和脚本输入文本的困难,并提供了一种统一的解决方案,它将字符转换,下一个字预测,拼写校正和自动脚本切换相结合,使其非常简单, 任何设备。 该技术通过通用IME(输入法编辑器)为任何语言提供了丰富且无缝的输入体验。 它允许用户使用普通qwerty键盘通过语音输入为任何语言输入任何脚本,同时允许在打字时自动完成和拼写校正单词和短语。 该技术还提供了无模式输入,可自动打开和关闭在不同类型脚本之间进行更改的输入模式。

    FASTER MINIMUM ERROR RATE TRAINING FOR WEIGHTED LINEAR MODELS
    8.
    发明申请
    FASTER MINIMUM ERROR RATE TRAINING FOR WEIGHTED LINEAR MODELS 有权
    用于加权线性模型的更快的最小误差率训练

    公开(公告)号:US20100262575A1

    公开(公告)日:2010-10-14

    申请号:US12423187

    申请日:2009-04-14

    IPC分类号: G06N7/02 G06N5/02 G06F15/18

    CPC分类号: G06N99/005 G06N5/003 G06N5/04

    摘要: The claimed subject matter provides systems and/or methods for training feature weights in a statistical machine translation model. The system can include components that obtain lists of translation hypotheses and associated feature values, set a current point in the multidimensional feature weight space to an initial value, chooses a line in the feature weight space that passes through the current point, and resets the current point to optimize the feature weights with respect to the line. The system can further include components that set the current point to be a best point attained, reduce the list of translation hypotheses based on a determination that a particular hypothesis has never been touched in optimizing the feature weights from at least one of an initial staring point or a randomly selected restarting point, and output the point ascertained to be the best point in the feature weight space.

    摘要翻译: 所要求保护的主题提供用于在统计机器翻译模型中训练特征权重的系统和/或方法。 该系统可以包括获得翻译假设和相关特征值的列表的组件,将多维特征权重空间中的当前点设置为初始值,在通过当前点的特征权重空间中选择一行,并重置当前 指向相对于线路优化特征权重。 该系统可以进一步包括将当前点设定为获得的最佳点的组件,基于在从初始凝视点中的至少一个优化特征权重时从未触及特定假设的确定来减少翻译假设列表 或随机选择的重新启动点,并且将确定的点输出为特征权重空间中的最佳点。

    RANDOM WALK RESTARTS IN MINIMUM ERROR RATE TRAINING
    9.
    发明申请
    RANDOM WALK RESTARTS IN MINIMUM ERROR RATE TRAINING 审中-公开
    随机在最小错误率训练中进行回归

    公开(公告)号:US20100023315A1

    公开(公告)日:2010-01-28

    申请号:US12179784

    申请日:2008-07-25

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2818

    摘要: The claimed subject matter provides systems and/or methods that minimize error rate training for statistical machine translation. The systems can include devices that optimize a statistical machine translation model for translating between a first natural language and a second natural language by generating lists of n-best translation hypotheses and associated feature weights, optimizing the associated feature weights with respect to the lists of n-best translation hypotheses, and thereafter determining a translation quality measurement for the training sets from which the lists of n-best translation hypotheses were derived.

    摘要翻译: 所要求保护的主题提供使统计机器翻译的误差率训练最小化的系统和/或方法。 该系统可以包括通过产生n个最佳翻译假设和相关联的特征权重的列表来优化用于在第一自然语言和第二自然语言之间进行翻译的统计机器翻译模型的设备,相对于n的列表优化关联的特征权重 最后的翻译假设,然后确定导出n个最佳翻译假设的列表的训练集的翻译质量测量。

    Statistical machine translation based search query spelling correction

    公开(公告)号:US10176168B2

    公开(公告)日:2019-01-08

    申请号:US13296640

    申请日:2011-11-15

    IPC分类号: G06F17/30 G06F17/28 G06F17/27

    摘要: Statistical Machine Translation (SMT) based search query spelling correction techniques are described herein. In one or more implementations, search data regarding searches performed by clients may be logged. The logged data includes query correction pairs that may be used to ascertain error patterns indicating how misspelled substrings may be translated to corrected substrings. The error patterns may be used to determine suggestions for an input query and to develop query correction models used to translate the input query to a corrected query. In one or more implementations, probabilistic features from multiple query correction models are combined to score different correction candidates. One or more top scoring correction candidates may then be exposed as suggestions for selection by a user and/or provided to a search engine to conduct a corresponding search using the corrected query version(s).