Patent search ap:("Xerox Corporation") AND inv:"Shachar Mirkin" Page 1

1.

发明授权
Machine translation-driven authoring system and method 有权
Title translation: 机器翻译驱动的制作系统和方法

公开(公告)号：US09047274B2

公开(公告)日：2015-06-02

申请号：US13746034

申请日：2013-01-21

Applicant: Xerox Corporation

Inventor： Sriram Venkatapathy , Shachar Mirkin

IPC: G06F17/28

CPC classification number: G06F17/2809 , G06F17/276 , G06F17/2818 , G06F17/2836

Abstract: An authoring method includes generating an authoring interface configured for assisting a user to author a text string in a source language for translation to a target string in a target language. Initial source text entered by the user is received through the authoring interface. Source phrases are selected that each include at least one token of the initial source text as a prefix and at least one other token as a suffix. The source phrase selection is based on a translatability score and optionally on fluency and semantic relatedness scores. A set of candidate phrases is proposed for display on the authoring interface, each of the candidate phases being the suffix of a respective one of the selected source phrases. The user may select one of the candidate phrases, which is appended to the source text following its corresponding prefix, or may enter alternative text. The process may be repeated until the user is satisfied with the source text and the SMT model can then be used for its translation.

Abstract translation: 创作方法包括生成创作界面，该创作界面被配置为协助用户将源语言中的文本串作为目标语言的目标字符串进行翻译。用户输入的初始源文本通过创作界面接收。选择源短语，其中每个包含起始源文本的至少一个令牌作为前缀，并且至少另外一个令牌作为后缀。源短语选择基于可翻译性分数，并且可选地基于流畅度和语义相关性得分。提出了一组候选短语用于在创作界面上显示，每个候选阶段是所选源短语中相应一个的后缀。用户可以选择候选短语中的一个，其附加到源文本之后的其对应的前缀，或者可以输入替代文本。可以重复该过程，直到用户对源文本满意，然后可以将SMT模型用于其翻译。

2.

发明申请
PREDICTING TRANSLATIONAL PREFERENCES 有权
Title translation: 预测翻译优先

公开(公告)号：US20170046333A1

公开(公告)日：2017-02-16

申请号：US14825652

申请日：2015-08-13

Applicant: Xerox Corporation

Inventor： Shachar Mirkin , Jean-Luc Meunier

IPC: G06F17/28

CPC classification number: G06F17/2818 , G06F17/289

Abstract: A system and method predict an optimal machine translation system for a first of a set of users. The method includes, for each of the users, providing a respective user profile which includes rankings for at least some machine translation systems from a set of machine translation systems. The user profile of the first user is updated, based on the user profiles of at least a subset of the other users. The updating includes generating at least one missing ranking. An optimal translation system for the first user from the set of machine translation systems is predicted, based on the updated user profile computed for the first user.

Abstract translation: 系统和方法为一组用户中的第一组预测最佳机器翻译系统。该方法包括对于每个用户，提供相应的用户简档，其包括来自一组机器翻译系统的至少一些机器翻译系统的排名。基于其他用户的至少一个子集的用户简档来更新第一用户的用户简档。更新包括产生至少一个丢失的排名。基于为第一用户计算的更新的用户简档来预测来自该机器翻译系统的集合的第一用户的最佳翻译系统。

3.

发明授权
Learning generation templates from dialog transcripts 有权
Title translation: 从对话成绩单学习生成模板

公开(公告)号：US09473637B1

公开(公告)日：2016-10-18

申请号：US14810817

申请日：2015-07-28

Applicant: Xerox Corporation

Inventor： Sriram Venkatapathy , Shachar Mirkin , Marc Dymetman

IPC: H04M3/00 , H04M5/00 , H04M1/64 , G10L21/00 , H04M3/51 , H04M3/42 , G06F17/28

CPC classification number: G06F17/2881 , G06F17/2715 , G06F17/2775 , G06F17/30654 , G06F17/30976 , G10L15/22 , H04M3/42221 , H04M3/5175 , H04M2203/355 , H04M2203/357

Abstract: Agent utterances are generated for implementing dialog acts recommended by a dialog manager of a call center. To this end, a set of word lattices, each represented as a weighted finite state automaton (WFSA), is constructed from training dialogs between call center agents and second parties (e.g. customers). The word lattices are assigned conditional probabilities over dialog act type. For each dialog act received from the dialog manager, the word lattices are ranked by the conditional probabilities for the dialog act type. At least one word lattice is chosen from the ranking, and is instantiated to generate a recommended agent utterance for implementing the recommended dialog act. The word lattices may be constructed by clustering agent utterances of training dialogs using context features from preceding second party utterances and grammatical dependency link features between words within agent utterances. Path variations of the word lattices may define slots or paraphrases.

Abstract translation: 生成用于实现呼叫中心的对话管理器推荐的对话行为的代理话语。为此，每个表示为加权有限状态自动机（WFSA）的一组字格栅由呼叫中心代理和第二方（例如客户）之间的训练对话构成。格子字是通过对话行为类型分配的条件概率。对于从对话管理器接收到的每一个对话行为，通过对话行为类型的条件概率对单词格子进行排序。从排名中选择至少一个字格，并被实例化以产生推荐的代理人话语，以实现推荐的对话行为。词格可以通过使用来自前面的第二方话语的上下文特征和代理语言中的词之间的语法依赖性链接特征的训练对话的聚类代理语言来构建。单词格子的路径变化可以定义槽或释义。

4.

发明授权
System and method for incrementally updating a reordering model for a statistical machine translation system 有权
Title translation: 用于逐步更新统计机器翻译系统的重新排序模型的系统和方法

公开(公告)号：US09442922B2

公开(公告)日：2016-09-13

申请号：US14546424

申请日：2014-11-18

Applicant: Xerox Corporation

Inventor： Shachar Mirkin

IPC: G06F17/28

CPC classification number: G06F17/289 , G06F17/2818

Abstract: A method for updating a reordering model of a statistical machine translation system includes, at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data including at least one sentence pair, each pair including a source sentence in a source language and a target sentence in a target language. Phrase pairs are extracted from the new training data and used to generate a new reordering file. A reordering model of the existing statistical machine translation system is updated, based on the new reordering file. The reordering model includes a reordering table. At a second time after the first time, new training data is received. The extracting of phrase pairs, generating of the new reordering file and the updating the reordering model is reiterated, based on the new training data received at the second time.

Abstract translation: 一种用于更新统计机器翻译系统的重排序模型的方法包括：第一时间接收用于再训练现有统计机器翻译系统的新训练数据，所述新训练数据包括至少一个句子对，每对包括源语句以目标语言的源语言和目标句子。从新的训练数据中提取短语对，并用于生成新的重排序文件。基于新的重新排序文件，更新现有统计机器翻译系统的重新排序模型。重新排序模型包括重排序表。在第一次第二次接收到新的训练数据。基于第二次接收到的新训练数据，重复提取短语对，生成新的重排序文件和更新重排序模型。

5.

发明申请
Confidence-driven rewriting of source texts for improved translation 审中-公开
Title translation: 信心驱动的重写源文本以改进翻译

公开(公告)号：US20140358519A1

公开(公告)日：2014-12-04

申请号：US13908157

申请日：2013-06-03

Applicant: Xerox Corporation

Inventor： Shachar Mirkin , Sriram Venkatapathy , Marc Dymetman

IPC: G06F17/28

CPC classification number: G06F17/2854 , G06F17/2818 , G06F17/2836

Abstract: A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed, based on the first target text string. At least one alternative text string is generated, where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated to generate a second target text string in the second natural language. A translation confidence is computed for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface.

Abstract translation: 用于重写源文本的方法包括接收包括第一自然语言的源文本串的源文本。源文本字符串用机器翻译系统翻译以生成第二自然语言的第一目标文本串。基于第一个目标文本字符串计算源文本字符串的翻译置信度。在可能的情况下，通过自动重写源字符串，以第一自然语言生成至少一个备选文本字符串。每个替代字符串被翻译以生成第二自然语言中的第二目标文本串。基于第二目标字符串计算替代文本字符串的翻译置信度。基于所计算的翻译信息，可以选择替代文本串之一作为源文本串的候选替代，并且可以在图形用户界面上向用户提出。

6.

发明申请
MACHINE TRANSLATION-DRIVEN AUTHORING SYSTEM AND METHOD 有权
Title translation: 机器翻译驱动作者系统及方法

公开(公告)号：US20140207439A1

公开(公告)日：2014-07-24

申请号：US13746034

申请日：2013-01-21

Applicant: XEROX CORPORATION

Inventor： Sriram Venkatapathy , Shachar Mirkin

IPC: G06F17/28

CPC classification number: G06F17/2809 , G06F17/276 , G06F17/2818 , G06F17/2836

Abstract: An authoring method includes generating an authoring interface configured for assisting a user to author a text string in a source language for translation to a target string in a target language. Initial source text entered by the user is received through the authoring interface. Source phrases are selected that each include at least one token of the initial source text as a prefix and at least one other token as a suffix. The source phrase selection is based on a translatability score and optionally on fluency and semantic relatedness scores. A set of candidate phrases is proposed for display on the authoring interface, each of the candidate phases being the suffix of a respective one of the selected source phrases. The user may select one of the candidate phrases, which is appended to the source text following its corresponding prefix, or may enter alternative text. The process may be repeated until the user is satisfied with the source text and the SMT model can then be used for its translation.

Abstract translation: 创作方法包括生成创作界面，该创作界面被配置为协助用户将源语言中的文本串作为目标语言的目标字符串进行翻译。用户输入的初始源文本通过创作界面接收。选择源短语，其中每个包含起始源文本的至少一个令牌作为前缀，并且至少另外一个令牌作为后缀。源短语选择基于可翻译性分数，并且可选地基于流畅度和语义相关性得分。提出了一组候选短语用于在创作界面上显示，每个候选阶段是所选源短语中相应一个的后缀。用户可以选择候选短语中的一个，其附加到源文本之后的其对应的前缀，或者可以输入替代文本。可以重复该过程，直到用户对源文本满意，然后可以将SMT模型用于其翻译。

7.

发明授权
System and method for predicting an optimal machine translation system for a user based on an updated user profile 有权

公开(公告)号：US10025779B2

公开(公告)日：2018-07-17

申请号：US14825652

申请日：2015-08-13

Applicant: Xerox Corporation

Inventor： Shachar Mirkin , Jean-Luc Meunier

IPC: G06F17/28

Abstract: A system and method predict an optimal machine translation system for a first of a set of users. The method includes, for each of the users, providing a respective user profile which includes rankings for at least some machine translation systems from a set of machine translation systems. The user profile of the first user is updated, based on the user profiles of at least a subset of the other users. The updating includes generating at least one missing ranking. An optimal translation system for the first user from the set of machine translation systems is predicted, based on the updated user profile computed for the first user.

8.

发明申请
METHOD AND SYSTEM FOR SUMMARIZING A DOCUMENT 审中-公开
Title translation: 方法和系统概述文件

公开(公告)号：US20160299881A1

公开(公告)日：2016-10-13

申请号：US14680096

申请日：2015-04-07

Applicant: XEROX CORPORATION , NETAJI SUBHASH INSTITUTE OF TECHNOLOGY (NSIT)

Inventor： Anand Gupta , Manpreet Kaur , Shachar Mirkin

IPC: G06F17/24 , G06F17/28

CPC classification number: G06F17/24 , G06F16/345 , G06F17/2785 , G06F17/28

Abstract: The disclosed embodiments illustrate methods and systems for summarizing an electronic document. The method includes extracting, by a natural language processor, one or more sentences from said electronic document. The method further includes creating a graph, comprising one or more nodes and one or more edges connecting said one or more nodes, each node being representative of a sentence. An edge is placed between a pair of sentences based on a threshold value and a first score. The first score corresponds to a measure of an entailment between said pair of sentences. Thereafter, the method includes identifying a set of nodes from said one or more nodes by applying a minimum vertex cover algorithm on said graph. The sentences associated with said identified set of nodes are utilizable to create a summary of said electronic document. The method is performed by one or more microprocessors.

Abstract translation: 所公开的实施例示出了用于总结电子文档的方法和系统。该方法包括由自然语言处理器从所述电子文档中提取一个或多个句子。该方法还包括创建图形，其包括连接所述一个或多个节点的一个或多个节点和一个或多个边缘，每个节点代表句子。基于阈值和第一分数将边缘放置在一对句子之间。第一分数对应于所述一对句子之间的含义的度量。此后，该方法包括通过在所述图上应用最小顶点覆盖算法从所述一个或多个节点识别一组节点。与所述确定的节点集相关联的句子可用于创建所述电子文档的摘要。该方法由一个或多个微处理器执行。

9.

发明申请
SYSTEM AND METHOD FOR INCREMENTALLY UPDATING A REORDERING MODEL FOR A STATISTICAL MACHINE TRANSLATION SYSTEM 有权
Title translation: 用于统计更新统计机器翻译系统的后处理模型的系统和方法

公开(公告)号：US20160140111A1

公开(公告)日：2016-05-19

申请号：US14546424

申请日：2014-11-18

Applicant: Xerox Corporation

Inventor： Shachar Mirkin

IPC: G06F17/28

CPC classification number: G06F17/289 , G06F17/2818

Abstract: A method for updating a reordering model of a statistical machine translation system includes, at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data including at least one sentence pair, each pair including a source sentence in a source language and a target sentence in a target language. Phrase pairs are extracted from the new training data and used to generate a new reordering file. A reordering model of the existing statistical machine translation system is updated, based on the new reordering file. The reordering model includes a reordering table. At a second time after the first time, new training data is received. The extracting of phrase pairs, generating of the new reordering file and the updating the reordering model is reiterated, based on the new training data received at the second time.

Abstract translation: 一种用于更新统计机器翻译系统的重新排序模型的方法包括：第一时间接收用于再训练现有统计机器翻译系统的新训练数据，所述新训练数据包括至少一个句子对，每对包括源语句以目标语言的源语言和目标句子。从新的训练数据中提取短语对，并用于生成新的重排序文件。基于新的重新排序文件，更新现有统计机器翻译系统的重新排序模型。重新排序模型包括重排序表。在第一次第二次接收到新的训练数据。基于第二次接收到的新训练数据，重复提取短语对，生成新的重排序文件和更新重排序模型。

10.

发明申请
REFINING INFERENCE RULES WITH TEMPORAL EVENT CLUSTERING 审中-公开
Title translation: 修改与时间事件聚合的干扰规则

公开(公告)号：US20150127323A1

公开(公告)日：2015-05-07

申请号：US14070786

申请日：2013-11-04

Applicant: Xerox Corporation

Inventor： Guillaume Jacquet , Shachar Mirkin

IPC: G06F17/27

CPC classification number: G06F17/271 , G06F16/3338 , G06F16/355 , G06F17/278 , G06F17/2785 , G06F17/2795

Abstract: A method for computing similarity between paths includes extracting corpus statistics for triples from a corpus of text documents, each triple comprising a predicate and respective first and second arguments of the predicate. Documents in the corpus are clustered to form a set of clusters based on textual similarity and temporal similarity. An event-based path similarity is computed between first and second paths, the first path comprising a first predicate and first and second argument slots, the second path comprising a second predicate and first and second argument slots, the event-based path similarity being computed as a function of a corpus statistics-based similarity score which is a function of the corpus statistics for the extracted triples which are instances of the first and second paths, and a cluster-based similarity score which is a function of occurrences of the first and second predicates in the clusters.

Abstract translation: 用于计算路径之间的相似性的方法包括从文本文档的语料库中提取三元组的语料库统计量，每个三元组包括谓词以及谓词的相应的第一和第二参数。基于文本相似性和时间相似性，语料库中的文档被聚类以形成一组聚类。在第一和第二路径之间计算基于事件的路径相似性，所述第一路径包括第一谓词和第一和第二参数时隙，所述第二路径包括第二谓词以及第一和第二参数时隙，所述基于事件的路径相似性被计算作为基于语料库统计的相似性得分的函数，其是作为第一和第二路径的实例的提取的三元组的语料库统计的函数，以及基于群集的相似性得分，其是第一和第二路径的出现的函数集群中的第二个谓词。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification