专利检索 ap:("John W. Tukey" OR "Jan O. Pedersen") AND inv:"Jan O. Pedersen" 第 1 页

1.

发明授权
Method and apparatus for automatic document summarization 失效
标题翻译：自动文件摘要的方法和装置

公开(公告)号：US5638543A

公开(公告)日：1997-06-10

申请号：US71114

申请日：1993-06-03

申请人： Jan O. Pedersen , John W. Tukey

发明人： Jan O. Pedersen , John W. Tukey

IPC分类号： G06F17/21 , G06F17/30

CPC分类号： G06F17/30719

摘要： Regions of a document such as sentences and blocks of sentences are scored and classified based upon their scores. An abstract of the document can be formed from the classified sentences. Sentences are classified by the use of words classified as stop words and vanish words. Sentences are scored based on the number of stop words and the number of strings of connected stop words, called stop-word runs, contained in the sentence. Passionate sentences, which usually contain information which the writer has strong feelings about, such as joy, admiration, or sadness, are identified. This method can also select sentences that are contrapassionate, which the writer may either have to strengthen or have inserted to complete the record and provide continuity or information.

摘要翻译： 文档的区域，例如句子和句子块根据他们的分数得分和分类。文件的摘要可以由分类句子形成。句子通过使用分类为停止词和消失词的词来分类。根据句子中包含的停止词的数量和所连接的停止词的串数（称为停止词运行），对句子进行评分。确定了热情的句子，通常包含作者对喜悦，钦佩或悲伤等强烈感情的信息。这种方法还可以选择具有矛盾性的句子，作者可能必须加强或插入以完成记录并提供连续性或信息。

2.

发明授权
Method of ordering document clusters without requiring knowledge of user interests 失效
标题翻译：在不需要用户兴趣的知识的情况下排序文档集群的方法

公开(公告)号：US5787420A

公开(公告)日：1998-07-28

申请号：US572558

申请日：1995-12-14

申请人： John W. Tukey , Jan O. Pedersen

发明人： John W. Tukey , Jan O. Pedersen

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , Y10S707/99932 , Y10S707/99934 , Y10S707/99935

摘要： A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, each having a centroid and including at least one document. Next, for each cluster a degree of similarity between that cluster and every other cluster is by finding a dot product between each cluster centroid and every other cluster centroid. The similarity information is then used to determine an order of presentation for the plurality of in a way that maximizes the degree of similarity between adjacent clusters.

摘要翻译： 在没有计算机用户的兴趣的任何指示的情况下，在浏览了以逻辑方式呈现文档簇的文档的语料库之后，排序文档簇以进行呈现的计算机化方法。该方法开始于将语料库分组成多个簇，每个簇具有质心并且包括至少一个文档。接下来，对于每个集群，该集群和每个其他集群之间的相似程度通过在每个集群质心和每个其他集群质心之间找到点积。然后，相似性信息用于以使相邻集群之间的相似度最大化的方式来确定多个呈现的顺序。

3.

发明授权
Iterative technique for phrase query formation and an information retrieval system employing same 失效
标题翻译：用于短语查询形成的迭代技术和采用它的信息检索系统

公开(公告)号：US5278980A

公开(公告)日：1994-01-11

申请号：US745794

申请日：1991-08-16

申请人： Jan O. Pedersen , Per-Kristian Halvorsen , Douglass R. Cutting , John W. Tukey , Eric A. Bier , Daniel G. Bobrow

发明人： Jan O. Pedersen , Per-Kristian Halvorsen , Douglass R. Cutting , John W. Tukey , Eric A. Bier , Daniel G. Bobrow

IPC分类号： G06F17/30 , G06F15/40 , G06F15/403

CPC分类号： G06F17/30646 , G06F17/30011 , Y10S707/99934

摘要： An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.

摘要翻译： 提供了一种信息检索系统和方法，其中操作者输入用于确定用于通过文档语料库搜索的搜索关键字的一个或多个查询词，并且将搜索关键字和文档语料库之间的任何匹配返回为包含与查询字匹配的词数据，与匹配字数据相邻的不停（内容）字，以及匹配字数据与下一相邻不停字之间的所有中间停止字的短语。操作者在查看一个或多个返回的短语之后，可以使用下一个相邻的非停止词中的一个或多个作为新的查询词来重新组合搜索关键字，并通过文档语料库执行后续搜索。这个过程可以迭代进行，直到找到相关文档。来自每个短语的附加非停止词优选彼此对齐（例如，通过列），以便于观看“新”内容词。

4.

发明授权
Method and apparatus for information access employing overlapping clusters 失效
标题翻译：使用重叠聚类的信息访问方法和装置

公开(公告)号：US5999927A

公开(公告)日：1999-12-07

申请号：US65828

申请日：1998-04-24

申请人： John W. Tukey , Jan O. Pedersen

发明人： John W. Tukey , Jan O. Pedersen

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , G06F17/30707 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99936

摘要： The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.

摘要翻译： 本发明是用于文档语料库的基于文档聚类的浏览的方法和装置，更具体地说，涉及使用重叠的聚类来改进回忆。本发明旨在通过使用非分离（重叠）聚类操作来提高信息访问方法和装置的性能。根据用于扩展文档簇以实现重叠的两种可能的方法进一步描述本发明，以及通过使用重叠的簇来提高精度的方法。

5.

发明授权
Method of ordering document clusters given some knowledge of user interests 失效
标题翻译：给定一些用户兴趣知识的文档集群的排序方法

公开(公告)号：US5911140A

公开(公告)日：1999-06-08

申请号：US572399

申请日：1995-12-14

申请人： John W. Tukey , Jan O. Pedersen

发明人： John W. Tukey , Jan O. Pedersen

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , Y10S707/99931 , Y10S707/99933 , Y10S707/99935 , Y10S707/99937

摘要： A method of automatically ordering the presentation of documents clusters generated from a ranked corpus of documents. First, the corpus is ordered into a plurality of clusters. Next, a rank is determined for each cluster based upon the rank of a document within that cluster. Afterward, the clusters are presented to a computer user in the order determined by their rank.

摘要翻译： 一种自动排序从排序的文档语料库生成的文档集合的呈现的方法。首先，语料库被排列成多个群集。接下来，基于该群集内的文档的等级，为每个群集确定等级。之后，按照其等级确定的顺序将群集呈现给计算机用户。

6.

发明授权
Method and apparatus for information accesss employing overlapping clusters 失效
标题翻译：使用重叠聚类的信息访问的方法和装置

公开(公告)号：US5787422A

公开(公告)日：1998-07-28

申请号：US585075

申请日：1996-01-11

申请人： John W. Tukey , Jan O. Pedersen

发明人： John W. Tukey , Jan O. Pedersen

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , G06F17/30707 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99936

摘要： The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.

摘要翻译： 本发明是用于文档语料库的基于文档聚类的浏览的方法和装置，更具体地说，涉及使用重叠的聚类来改进记忆。本发明旨在通过使用非分离（重叠）聚类操作来提高信息访问方法和装置的性能。根据用于扩展文档簇以实现重叠的两种可能的方法进一步描述本发明，以及通过使用重叠的簇来提高精度的方法。

7.

发明授权
Article and method of automatically determining text genre using surface features of untagged texts 失效
标题翻译：使用未标记文本的表面特征自动确定文本类型的文章和方法

公开(公告)号：US06973423B1

公开(公告)日：2005-12-06

申请号：US09100189

申请日：1998-06-18

申请人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler , Gregory Grefenstette

发明人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler , Gregory Grefenstette

IPC分类号： G06F17/20 , G06F17/27 , G06F17/30

CPC分类号： G06F17/2745 , G06F17/274

摘要： A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.

摘要翻译： 一种处理器实现的方法，用于识别机器可读，未标记的文本的文本类型。处理器实现的方法开始于从文本生成提示向量，其代表第一组非结构化表面线索的文本中的出现，其易于计算。之后，处理器确定文本是否是使用提示向量的第一文本类型的实例以及与第一文本类型相关联的加权向量。

8.

发明授权
Article and method of automatically filtering information retrieval results using test genre 失效
标题翻译：使用测试类型自动过滤信息检索结果的文章和方法

公开(公告)号：US06505150B2

公开(公告)日：2003-01-07

申请号：US09100201

申请日：1998-06-18

申请人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler

发明人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler

IPC分类号： G10L1720

CPC分类号： G06F17/277 , G06F17/271 , G06F17/2775 , G06F17/2785 , G06F17/30705 , G06F17/30707

摘要： A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

摘要翻译： 根据文本进行过滤的方法类型是对未标记的机器可读文本的异构语料库的主题搜索的结果。因为语料库的每个文本都有一个主题和一个文本类型，所以语料库包含多个文本类型并涵盖多个主题。根据该方法，处理器首先在语料库中搜索具有第一主题的第一多个文本。接下来，处理器识别作为第一文本类型的实例的第一多重性的第一组文本，并且识别作为第二文本类型的实例的第一多重性的第二组文本。最后，处理器基于第一文本类型和第二文本类型向计算机用户标识第一多个文本。

9.

发明授权
Method of constant interaction-time clustering applied to document browsing 失效
标题翻译：不断的交互时间聚类方法应用于文档浏览

公开(公告)号：US5483650A

公开(公告)日：1996-01-09

申请号：US79292

申请日：1993-06-21

申请人： Jan O. Pedersen , David R. Karger , Douglass R. Cutting

发明人： Jan O. Pedersen , David R. Karger , Douglass R. Cutting

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , G06F17/30011 , Y10S707/99932

摘要： Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument into a plurality of subsequent metadocuments. The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

摘要翻译： 通过将具有至少一个初始元文件的焦点集扩展到多个后续元文件来处理任意大的文档集合。随后的元文件数量大约等于预定的最大数量。随后的元文件然后被聚集成预定数量的新的元文件，其被汇总并呈现给用户。焦点集被重新定义为仅包括用户选择的新的元文件。

10.

发明授权
Automatic method of extracting summarization using feature probabilities 失效
标题翻译：使用特征概率提取摘要的自动方法

公开(公告)号：US5918240A

公开(公告)日：1999-06-29

申请号：US495986

申请日：1995-06-28

申请人： Julian M. Kupiec , Jan O. Pedersen , Francine R. Chen , Daniel C. Brotsky , Steven B. Putz

发明人： Julian M. Kupiec , Jan O. Pedersen , Francine R. Chen , Daniel C. Brotsky , Steven B. Putz

IPC分类号： G06F17/21 , G06F17/27 , G06F17/30

CPC分类号： G06F17/30719

摘要： A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.

摘要翻译： 自动生成文档提取的方法。该方法利用从手动生成的摘要的统计分析产生的特征值概率来提取专家可能的同一组句子。该方法基于迭代方法。首先，计算机系统将文档的句子指定为所选择的句子。第二，计算机系统确定特征集的每个特征的所选择的句子的值。第三，计算机系统基于所选择的句子的特征值以及与该值相关联的概率来增加所选句子的得分。第四，在对计算机系统的文档的所有句子进行评分之后，计算机系统选择要提取的最高得分句子的子集。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类