-
公开(公告)号:WO2010135204A2
公开(公告)日:2010-11-25
申请号:PCT/US2010/035033
申请日:2010-05-14
Applicant: MICROSOFT CORPORATION
Inventor: DOLAN, William B. , BROCKETT, Christopher J. , CASTILLO, Julio J. , VANDERWENDE, Lucretia H.
CPC classification number: G06F17/2818 , G06F17/2845
Abstract: A mining system applies queries to retrieve result items from an unstructured resource. The unstructured resource may correspond to a repository of network-accessible resource items. The result items that are retrieved may correspond to text segments (e.g., sentence fragments) associated with resource items. The mining system produces a structured training set by filtering the result items and establishing respective pairs of result items. A training system can use the training set to produce a statistical translation model. The translation model can be used in a monolingual context to translate between semantically-related phrases in a single language. The translation model can also be used in a bilingual context to translate between phrases expressed in two respective languages. Various applications of the translation model are also described.
Abstract translation: 挖掘系统应用查询从非结构化资源中检索结果项。 非结构化资源可以对应于网络可访问的资源项目的存储库。 检索的结果项目可以对应于与资源项目相关联的文本段(例如,句子片段)。 采矿系统通过过滤结果项目并建立相应的成果项目来生成结构化训练集。 培训系统可以使用训练集来产生统计翻译模型。 翻译模型可以用于单语上下文中,以单一语言在语义相关的短语之间进行翻译。 翻译模型也可用于双语语境中,以两种语言表达的短语之间进行翻译。 还描述了翻译模型的各种应用。