Collocation translation from monolingual and available bilingual corpora
    71.
    发明申请
    Collocation translation from monolingual and available bilingual corpora 审中-公开
    单语和双语语料库的翻译

    公开(公告)号:US20060282255A1

    公开(公告)日:2006-12-14

    申请号:US11152540

    申请日:2005-06-14

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827

    摘要: A system and method of extracting collocation translations is presented. The methods include constructing a collocation translation model using monolingual source and target language corpora as well as bilingual corpus, if available. The collocation translation model employs an expectation maximization algorithm with respect to contextual words surrounding collocations. The collocation translation model can be used later to extract a collocation translation dictionary. Optional filters based on context redundancy and/or bi-directional translation constrain can be used to ensure that only highly reliable collocation translations are included in the dictionary. The constructed collocation translation model and the extracted collocation translation dictionary can be used later for further natural language processing, such as sentence translation.

    摘要翻译: 提出了一种提取搭配翻译的系统和方法。 这些方法包括使用单语源语言和目标语言语料库以及双语语料库(如果可用)来构建搭配翻译模型。 搭配翻译模型采用围绕搭配的上下文单词的期望最大化算法。 搭配翻译模型可以随后用于提取搭配翻译字典。 可以使用基于上下文冗余和/或双向转换约束的可选过滤器来确保字典中仅包含高度可靠的并置转换。 构建的搭配翻译模型和提取的搭配翻译词典可以稍后用于进一步的自然语言处理,如句子翻译。

    Method and apparatus for distribution-based language model adaptation

    公开(公告)号:US20060009965A1

    公开(公告)日:2006-01-12

    申请号:US11225543

    申请日:2005-09-13

    IPC分类号: G06F17/27

    摘要: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

    Search query and document-related data translation
    76.
    发明授权
    Search query and document-related data translation 有权
    搜索查询和文档相关的数据翻译

    公开(公告)号:US09501759B2

    公开(公告)日:2016-11-22

    申请号:US13328924

    申请日:2011-12-16

    摘要: The subject disclosure is directed towards developing a translation model for mapping search query terms to document-related data. By processing user logs comprising search histories into word-aligned query-document pairs, the translation model may be trained using data, such as probabilities, corresponding to the word-aligned query-document pairs. After incorporating the translation model into model data for a search engine, the translation model is used may used as features for producing relevance scores for current search queries and ranking documents/advertisements according to relevance.

    摘要翻译: 本发明旨在开发用于将搜索查询词语映射到文档相关数据的翻译模型。 通过将包括搜索历史的用户日志处理成字对齐的查询 - 文档对,可以使用对应于字对齐的查询 - 文档对的诸如概率的数据来训练翻译模型。 在将翻译模型合并到搜索引擎的模型数据中之后,使用翻译模型可以用作根据相关性产生当前搜索查询和排序文档/广告的相关性分数的特征。

    Universal text input
    77.
    发明授权
    Universal text input 有权
    通用文本输入

    公开(公告)号:US08738356B2

    公开(公告)日:2014-05-27

    申请号:US13110484

    申请日:2011-05-18

    IPC分类号: G06F17/28

    CPC分类号: G06F17/27

    摘要: The universal text input technique described herein addresses the difficulties of typing text in various languages and scripts, and offers a unified solution, which combines character conversion, next word prediction, spelling correction and automatic script switching to make it extremely simple to type any language from any device. The technique provides a rich and seamless input experience in any language through a universal IME (input method editor). It allows a user to type in any script for any language using a regular qwerty keyboard via phonetic input and at the same time allows for auto-completion and spelling correction of words and phrases while typing. The technique also provides a modeless input that automatically turns on and off an input mode that changes between different types of script.

    摘要翻译: 本文描述的通用文本输入技术解决了以各种语言和脚本输入文本的困难,并提供了一种统一的解决方案,它将字符转换,下一个字预测,拼写校正和自动脚本切换相结合,使其非常简单, 任何设备。 该技术通过通用IME(输入法编辑器)为任何语言提供了丰富且无缝的输入体验。 它允许用户使用普通qwerty键盘通过语音输入为任何语言输入任何脚本,同时允许在打字时自动完成和拼写校正单词和短语。 该技术还提供了无模式输入,可自动打开和关闭在不同类型脚本之间进行更改的输入模式。

    Search Query and Document-Related Data Translation
    78.
    发明申请
    Search Query and Document-Related Data Translation 有权
    搜索查询和文档相关数据翻译

    公开(公告)号:US20130103493A1

    公开(公告)日:2013-04-25

    申请号:US13328924

    申请日:2011-12-16

    IPC分类号: G06Q30/02 G06F17/30

    摘要: The subject disclosure is directed towards developing a translation model for mapping search query terms to document-related data. By processing user logs comprising search histories into word-aligned query-document pairs, the translation model may be trained using data, such as probabilities, corresponding to the word-aligned query-document pairs. After incorporating the translation model into model data for a search engine, the translation model is used may used as features for producing relevance scores for current search queries and ranking documents/advertisements according to relevance.

    摘要翻译: 本发明旨在开发用于将搜索查询词语映射到文档相关数据的翻译模型。 通过将包括搜索历史的用户日志处理成字对齐的查询 - 文档对,可以使用对应于字对齐的查询 - 文档对的诸如概率的数据来训练翻译模型。 在将翻译模型合并到搜索引擎的模型数据中之后,使用翻译模型可以用作根据相关性产生当前搜索查询和排序文档/广告的相关性分数的特征。

    Web-based proofing and usage guidance
    79.
    发明授权
    Web-based proofing and usage guidance 有权
    基于Web的打样和使用指南

    公开(公告)号:US07991609B2

    公开(公告)日:2011-08-02

    申请号:US11713073

    申请日:2007-02-28

    IPC分类号: G06F17/27

    CPC分类号: G06F17/274 G06F17/273

    摘要: A system is disclosed for checking grammar and usage using a flexible portfolio of different mechanisms, and automatically providing a variety of different examples of standard usage, selected from analogous Web content. The system can be used for checking the grammar and usage in any application that involves natural language text, such as word processing, email, and presentation applications. The grammar and usage can be evaluated using several complementary evaluation modules, which may include one based on a trained classifier, one based on regular expressions, and one based on comparative searches of the Web or a local corpus. The evaluation modules can provide a set of suggested alternative segments with corrected grammar and usage. A followup, screened Web search based on the alternative segments, in context, may provide several different in-context examples of proper grammar and usage that the user can consider and select from.

    摘要翻译: 公开了一种用于使用不同机制的灵活组合来检查语法和使用的系统,并且自动提供从类似的Web内容中选择的各种不同的标准用法示例。 该系统可用于检查涉及自然语言文本(例如文字处理,电子邮件和演示应用程序)的任何应用程序中的语法和用法。 可以使用几个补充评估模块来评估语法和用法,这些模块可以包括基于经过训练的分类器,基于正则表达式的分类器,以及基于Web或本地语料库的比较搜索的评估模块。 评估模块可以提供一组具有校正语法和用法的建议替代段。 在上下文中,基于替代段的后续筛选的Web搜索可以提供用户可以考虑和选择的适当的语法和使用的几个不同的上下文示例。

    Ranker selection for statistical natural language processing
    80.
    发明授权
    Ranker selection for statistical natural language processing 有权
    统计自然语言处理的Ranker选择

    公开(公告)号:US07844555B2

    公开(公告)日:2010-11-30

    申请号:US11938811

    申请日:2007-11-13

    CPC分类号: G06F17/2715

    摘要: Systems and methods for selecting a ranker for statistical natural language processing are provided. One disclosed system includes a computer program configured to be executed on a computing device, the computer program comprising a data store including reference performance data for a plurality of candidate rankers, the reference performance data being calculated based on a processing of test data by each of the plurality of candidate rankers. The system may further include a ranker selector configured to receive a statistical natural language processing task and a performance target, and determine a selected ranker from the plurality of candidate rankers based on the statistical natural language processing task, the performance target, and the reference performance data.

    摘要翻译: 提供了用于选择用于统计自然语言处理的游戏者的系统和方法。 一种公开的系统包括被配置为在计算设备上执行的计算机程序,该计算机程序包括数据存储器,该数据存储器包括用于多个候选排名者的参考演出数据,该参考演出数据是基于每个测试数据的处理来计算的 多个候选排名。 该系统可以进一步包括配置成接收统计自然语言处理任务和性能目标的排队选择器,并且基于统计自然语言处理任务,性能目标和参考性能来确定来自多个候选排名者的选定队员 数据。