Patent search ap:("XEROX CORPORATION") AND inv:"Pedersen Page Jan O."

11.

发明公开
Finite-state transduction of related word forms for text indexing and retrieval 失效
Title translation: 有限状态转换文本索引和检索的相关字词形式

公开(公告)号：EP0583083A3

公开(公告)日：1994-09-07

申请号：EP93305626.9

申请日：1993-07-19

Applicant: XEROX CORPORATION

Inventor： Cutting, Douglass R. , Halvorsen, Per-Kristian G. , Kaplan, Ronald M. , Karttunen, Lauri , Kay, Martin , Pedersen, Jan O.

IPC: G06F15/401 , G06F15/415 , G06F15/417

CPC classification number: G06F17/30616 , G06F17/30663 , G06F17/30666 , G06F17/30672 , G06F17/30684 , G06F17/30985 , G06F17/30988 , Y10S707/99931

Abstract: The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers (FSTs) to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The merged FST (70) may be produced by simultaneously intersecting (&) and composing (o) a lexicon transducer (65) and a number of rule transducers (61-63). Although the resulting FSTs can have many states and transitions or arcs, they can be compacted by finite- state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel FST (70) as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.

12.

发明公开
Finite-state transduction of related word forms for text indexing and retrieval 失效
Title translation: 形成的利用有限自动机的文本索引和检索相关的单词形式。

公开(公告)号：EP0583083A2

公开(公告)日：1994-02-16

申请号：EP93305626.9

申请日：1993-07-19

Applicant: XEROX CORPORATION

Inventor： Cutting, Douglass R. , Halvorsen, Per-Kristian G. , Kaplan, Ronald M. , Karttunen, Lauri , Kay, Martin , Pedersen, Jan O.

IPC: G06F15/401 , G06F15/415 , G06F15/417

CPC classification number: G06F17/30616 , G06F17/30663 , G06F17/30666 , G06F17/30672 , G06F17/30684 , G06F17/30985 , G06F17/30988 , Y10S707/99931

Abstract: The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers (FSTs) to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The merged FST (70) may be produced by simultaneously intersecting (&) and composing (o) a lexicon transducer (65) and a number of rule transducers (61-63). Although the resulting FSTs can have many states and transitions or arcs, they can be compacted by finite- state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel FST (70) as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.

Abstract translation: 本发明解决了在使用了一些问题茎中的自然语言文档全文检索（字含义的典型指标），并因此允许回顾在不牺牲精度的提高。它使用有限状态变换器（FSTS）的各种布置，以精确地设置编码若干映射的较佳方式来回字之间和茎，并考虑到由字字语言的形态规则系统的两个系统的方面，因此在所以没有违规何时。合并后的FST（70）可以通过同时相交（＆）和构成（O）的词典换能器（65）和多个规则的换能器（61-63）来制备。虽然产生的FSTS可以有很多状态和转换或弧，它们可以通过有限状态的压缩算法进行压缩，以便thatthey可以有效地在资源有限的应用中使用。本发明考虑的信息检索系统，其包括新颖的FST（70）作为一个数据库和用于响应用户查询，搜索数据库中的一个处理器，以及用于输出铃声正确的响应，如果它们的存在，以及在搜索中使用的新颖的数据库的系统和用于构造新的数据库的方法。

13.

发明授权
A cluster-based method and system for browsing large document collections 失效
Title translation: 基于组的方法和系统，以查看大文档集合

公开(公告)号：EP0542429B1

公开(公告)日：2000-05-31

申请号：EP92309402.3

申请日：1992-10-15

Applicant: XEROX CORPORATION

Inventor： Pedersen, Jan O. , Tukey, John W. , Karger, David , Cutting, Douglass R.

IPC: G06F17/30

CPC classification number: G06F17/3071 , G06F17/30011 , Y10S707/99935 , Y10S707/99937

14.

发明授权
An iterative technique for phrase query formation and an information retrieval system employing same 失效
Title translation: 用于搜索短语和信息检索系统使用这种迭代过程

公开(公告)号：EP0530993B1

公开(公告)日：1999-05-19

申请号：EP92307372.0

申请日：1992-08-12

Applicant: XEROX CORPORATION

Inventor： Pedersen, Jan O. , Tukey, John W. , Halvorsen, Per-Kristian , Bier, Eric A. , Cutting, Douglass R. , Bobrow, Daniel G.

IPC: G06F17/30

CPC classification number: G06F17/30646 , G06F17/30011 , Y10S707/99934

15.

发明公开
Automatic method of extracting summarization using feature probabilities 失效
Title translation: Automatische Methode zur Extraktionszusammenfassung durch Gebrauch von Merkmal-Wahrscheinlichkeiten

公开(公告)号：EP0751469A1

公开(公告)日：1997-01-02

申请号：EP96304777.4

申请日：1996-06-28

Applicant: XEROX CORPORATION

Inventor： Kupiec, Julian M. , Pedersen, Jan O. , Chen, Francine R. , Brotsky, Daniel C. , Putz, Steven B.

IPC: G06F17/30

CPC classification number: G06F17/30719

Abstract: A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.

Abstract translation: 自动生成文档提取的方法。该方法利用从手动生成的摘要的统计分析生成的特征值概率来提取专家可能的同一组句子。该方法基于迭代方法。首先，计算机系统将文档的句子指定为所选择的句子。第二，计算机系统确定特征集的每个特征的所选择的句子的值。第三，计算机系统基于所选择的句子的特征值以及与该值相关联的概率来增加所选句子的得分。第四，在对计算机系统的文档的所有句子进行评分之后，计算机系统选择要提取的最高得分句子的子集。

16.

发明公开
A cluster-based method and system for browsing large document collections 失效
Title translation: 基于群集的浏览大型文档集合的方法和系统

公开(公告)号：EP0542429A3

公开(公告)日：1994-04-27

申请号：EP92309402.3

申请日：1992-10-15

Applicant: XEROX CORPORATION

Inventor： Pedersen, Jan O. , Tukey, John W. , Karger, David , Cutting, Douglass R.

IPC: G06F15/403 , G06F15/401

CPC classification number: G06F17/3071 , G06F17/30011 , Y10S707/99935 , Y10S707/99937

Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non- time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

Abstract translation: Scatter-Gather是一种基于计算机的文档浏览方法，其在时间上与目标语料库中的多个文档成比例地操作。分散聚集方法包括：使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式效用的语料库的初始排序的总结; 并使用例如在线非确定性方法提供语料库的进一步排序。脱机准备语料库的初始排序步骤是非时间依赖的，因此准备了准确的初始排序。确定摘要的步骤包括确定用于呈现给用户而不用在CRT上滚动的摘要。提供进一步排序的步骤包括截断群平均聚集聚类，合并不相关文档集，中心查找，分配到最近和其他细化方法。

17.

发明公开
Electronic document processing systems 失效
Title translation: 电子文件处理系统

公开(公告)号：EP0459792A3

公开(公告)日：1993-08-04

申请号：EP91304879.9

申请日：1991-05-30

Applicant: XEROX CORPORATION

Inventor： Zdybel, Frank, Jr. , Henderson, D. Austin, Jr. , Sang, Henry W., Jr. , Hecht, David L. , Pedersen, Jan O. , Bloomberg, Dan S. , Smith, Z. Erol, III

IPC: G06F17/30 , G06F17/60

CPC classification number: H04N1/32133 , G06F17/30011 , G06Q10/10 , H04N2201/3204 , H04N2201/3205 , H04N2201/3214 , H04N2201/3226 , H04N2201/3232 , H04N2201/3233 , H04N2201/3242 , H04N2201/3269 , H04N2201/3271

Abstract: Provision is made in electronic document processing systems for printing unfiltered or filtered machine-readable digital representations of electronic documents, and human-readable renderings of them on the same record medium using the same printing process. The integration of machine-readable digital representations of electronic documents with the human-readable hardcopy renderings of them may be employed, for example, not only to enhance the precision with which the structure and content of such electronic documents can be recovered by scanning such hardcopies into electronic document processing systems, but also as a mechanism for enabling recipients of scanned-in versions of such documents to identify and process annotations that were added to the hardcopies after they were printed and/or for alerting the recipients of the scanned-in documents to alterations that may have been made to the original human-readable content of the hardcopy renderings. In addition to storage of the electronic representation of the document, provision is made for encoding information about the electronic representation of the document itself, such as file name, creation and modification dates, access and security information, printing histories. Provision is also made for encoding information which is computed from the content of the document and other information, for purposes of authentication and verification of document integrity. Provision is also made for the encoding of information which relates to operations which are to be performed depending on handwritten marks made upon a hardcopy rendering of the document; for example, encoding instructions of what action is to be taken when a box on a document is checked. Provision is also made for encoding in the hardcopy another class of information: information about the rendering of the document specific to that hard copy, which can include a numbered copy of that print, the identification of the machine which performed that print, the reproduction characteristics of the printer, the screen frequency and rotation used by the printer in rendering halftones. Provision is also made for encoding information about the digital encoding mechanism itself, such as information given in standard-encoded headers about subsequently compressed or encrypted digital information.

18.

发明授权
Method for clustering a large collection of documents 失效
Title translation: 一种鉴别一个大的文档收集方法

公开(公告)号：EP0980043B1

公开(公告)日：2003-05-07

申请号：EP99203801.8

申请日：1992-10-15

Applicant: Xerox Corporation

Inventor： Pedersen, Jan O. , Karger, David , Cutting, Douglass R. , Tukey, John W.

IPC: G06F17/30

CPC classification number: G06F17/3071 , G06F17/30011 , Y10S707/99935 , Y10S707/99937

19.

发明公开
A cluster-based method and system for browsing large document collections 失效
Title translation: Gruppenbasiertes Verfahren und System，um grosse Dokumentsammlungen anzuschauen

公开(公告)号：EP0980043A2

公开(公告)日：2000-02-16

申请号：EP99203801.8

申请日：1992-10-15

Applicant: Xerox Corporation

Inventor： Pedersen, Jan O. , Karger, David , Cutting, Douglass R. , Tukey, John W.

IPC: G06F17/30

CPC classification number: G06F17/3071 , G06F17/30011 , Y10S707/99935 , Y10S707/99937

Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

Abstract translation: Scatter-Gather是一种基于计算机的文档浏览方法，与目标语料库中的文档数量成正比。分散收集方法包括：使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式实用程序的语料库的初始排序的摘要; 并使用例如在线非确定性方法提供语料库的进一步排序。离线准备语料库的初始排序的步骤是非时间依赖的，因此准备了准确的初始排序。确定摘要的步骤包括确定用于呈现给用户的摘要，而不在CRT上滚动。提供进一步排序的步骤包括截断组平均聚集聚类，合并不相关文档集合，中心查找，分配到最近和其他细化方法。

20.

发明公开
Method of processing a corpus of electronically stored documents 失效
Title translation: 一种用于处理多个电子存储的文档的方法。

公开(公告)号：EP0631245A3

公开(公告)日：1995-02-22

申请号：EP94304471.9

申请日：1994-06-20

Applicant: XEROX CORPORATION

Inventor： Pedersen, Jan O. , Karger, David R. , Cutting, Douglass R.

IPC: G06F15/403 , G06F15/401

CPC classification number: G06F17/3071 , G06F17/30011 , Y10S707/99932

Abstract: Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument (82) into a plurality of subsequent metadocuments (83,84,85,86). The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification