-
公开(公告)号:US08825469B1
公开(公告)日:2014-09-02
申请号:US13350897
申请日:2012-01-16
申请人: Sarveshwar Rao Duddu , Franz Josef Och , Eahab E. Ibrahim , Joshua James Estelle , Shankar Kumar
发明人: Sarveshwar Rao Duddu , Franz Josef Och , Eahab E. Ibrahim , Joshua James Estelle , Shankar Kumar
IPC分类号: G10L15/00
CPC分类号: G06F17/2854 , G06F17/2827 , G06F17/2836
摘要: A computer-implemented method includes receiving a document and a request to translate the document to a different language, the document including at least one tag associated with a first portion of text within the document, receiving a manual translation of the document translated by a human translator but not including the at least one tag, generating a plurality of alignments between the document and the manual translation using a statistical alignment model, selecting one of the plurality of alignments based on a likelihood that the first portion of text in the document corresponds to an aligned second portion of text within the manual translation, mapping a location of the tag in the document to a corresponding location within the manual translation based on the selected alignment, and inserting the at least one tag into the manual translation at the corresponding location to obtain a modified manual translation of the document.
摘要翻译: 计算机实现的方法包括接收文档和将文档翻译成不同语言的请求,所述文档包括与文档中的文本的第一部分相关联的至少一个标签,接收由人翻译的文档的手动翻译 翻译器,但不包括所述至少一个标签,使用统计对准模型在所述文档和所述手动翻译之间生成多个对齐,基于所述文档中的第一部分文本对应于的可能性来选择所述多个对齐中的一个 在手动翻译中的对齐的文本的第二部分,基于所选择的对齐,将文档中的标签的位置映射到手动翻译内的对应位置,以及将至少一个标签插入到相应位置处的手动翻译 获取文档的修改后的手动翻译。
-
公开(公告)号:US20110202330A1
公开(公告)日:2011-08-18
申请号:US13026936
申请日:2011-02-14
CPC分类号: G06F17/2755
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.
摘要翻译: 公开了包括在计算机存储介质上编码的用于分解复合词的计算机程序的方法,系统和装置。 在一个方面,一种方法包括获得包括字符序列的标记,识别作为令牌的组成部分的两个或更多候选子字,以及将子字变换成令牌所需的一个或多个形态操作 其中至少一个形态操作涉及使用非词典单词,并且确定与每个子单词相关联的成本以及与每个形态操作相关联的成本。
-
公开(公告)号:US09075792B2
公开(公告)日:2015-07-07
申请号:US13026936
申请日:2011-02-14
CPC分类号: G06F17/2755
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.
摘要翻译: 公开了包括在计算机存储介质上编码的用于分解复合词的计算机程序的方法,系统和装置。 在一个方面,一种方法包括获得包括字符序列的标记,识别作为令牌的组成部分的两个或更多候选子字,以及将子字变换成令牌所需的一个或多个形态操作 其中至少一个形态操作涉及使用非词典单词,并且确定与每个子单词相关联的成本以及与每个形态操作相关联的成本。
-
公开(公告)号:US08812517B1
公开(公告)日:2014-08-19
申请号:US13296460
申请日:2011-11-15
申请人: Ashish Venugopal , Jurij Ganitkevic , Franz Josef Och , David Robert Talbot , Jakob David Uszkoreit
发明人: Ashish Venugopal , Jurij Ganitkevic , Franz Josef Och , David Robert Talbot , Jakob David Uszkoreit
IPC分类号: G06F7/00
CPC分类号: G06F17/30905
摘要: A way of detecting a watermark present in a structured result, such as a search result or a machine translation. The structured result is received and a hash is computed based upon at least part of the result. The resulting bit sequence is tested against a null hypothesis that the bit sequence was generated by a random variable with a binomial distribution with a parameter p=0.5. The result of this test is compared to a significance level, which indicates whether the structured result is watermarked.
摘要翻译: 检测结构化结果(例如搜索结果或机器翻译)中存在的水印的方式。 接收结构化结果,并且基于结果的至少一部分来计算散列。 所得到的比特序列是针对零假设进行测试的,该比特序列由具有参数p = 0.5的二项分布的随机变量生成。 将该测试的结果与显着性水平进行比较,其显示结构化结果是否被水印。
-
-
-