专利检索 ap:"Jan. O. Pedersen" 第 1 页

1.

发明授权
Method of ordering document clusters without requiring knowledge of user interests 失效
标题翻译：在不需要用户兴趣的知识的情况下排序文档集群的方法

公开(公告)号：US5787420A

公开(公告)日：1998-07-28

申请号：US572558

申请日：1995-12-14

申请人： John W. Tukey , Jan O. Pedersen

发明人： John W. Tukey , Jan O. Pedersen

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , Y10S707/99932 , Y10S707/99934 , Y10S707/99935

摘要： A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, each having a centroid and including at least one document. Next, for each cluster a degree of similarity between that cluster and every other cluster is by finding a dot product between each cluster centroid and every other cluster centroid. The similarity information is then used to determine an order of presentation for the plurality of in a way that maximizes the degree of similarity between adjacent clusters.

摘要翻译： 在没有计算机用户的兴趣的任何指示的情况下，在浏览了以逻辑方式呈现文档簇的文档的语料库之后，排序文档簇以进行呈现的计算机化方法。该方法开始于将语料库分组成多个簇，每个簇具有质心并且包括至少一个文档。接下来，对于每个集群，该集群和每个其他集群之间的相似程度通过在每个集群质心和每个其他集群质心之间找到点积。然后，相似性信息用于以使相邻集群之间的相似度最大化的方式来确定多个呈现的顺序。

2.

发明授权
Article and method of automatically determining text genre using surface features of untagged texts 失效
标题翻译：使用未标记文本的表面特征自动确定文本类型的文章和方法

公开(公告)号：US06973423B1

公开(公告)日：2005-12-06

申请号：US09100189

申请日：1998-06-18

申请人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler , Gregory Grefenstette

发明人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler , Gregory Grefenstette

IPC分类号： G06F17/20 , G06F17/27 , G06F17/30

CPC分类号： G06F17/2745 , G06F17/274

摘要： A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.

摘要翻译： 一种处理器实现的方法，用于识别机器可读，未标记的文本的文本类型。处理器实现的方法开始于从文本生成提示向量，其代表第一组非结构化表面线索的文本中的出现，其易于计算。之后，处理器确定文本是否是使用提示向量的第一文本类型的实例以及与第一文本类型相关联的加权向量。

3.

发明授权
Article and method of automatically filtering information retrieval results using test genre 失效
标题翻译：使用测试类型自动过滤信息检索结果的文章和方法

公开(公告)号：US06505150B2

公开(公告)日：2003-01-07

申请号：US09100201

申请日：1998-06-18

申请人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler

发明人： Geoffrey D. Nunberg , Hinrich Schuetze , Jan O. Pedersen , Brett L. Kessler

IPC分类号： G10L1720

CPC分类号： G06F17/277 , G06F17/271 , G06F17/2775 , G06F17/2785 , G06F17/30705 , G06F17/30707

摘要： A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

摘要翻译： 根据文本进行过滤的方法类型是对未标记的机器可读文本的异构语料库的主题搜索的结果。因为语料库的每个文本都有一个主题和一个文本类型，所以语料库包含多个文本类型并涵盖多个主题。根据该方法，处理器首先在语料库中搜索具有第一主题的第一多个文本。接下来，处理器识别作为第一文本类型的实例的第一多重性的第一组文本，并且识别作为第二文本类型的实例的第一多重性的第二组文本。最后，处理器基于第一文本类型和第二文本类型向计算机用户标识第一多个文本。

4.

发明授权
Method of constant interaction-time clustering applied to document browsing 失效
标题翻译：不断的交互时间聚类方法应用于文档浏览

公开(公告)号：US5483650A

公开(公告)日：1996-01-09

申请号：US79292

申请日：1993-06-21

申请人： Jan O. Pedersen , David R. Karger , Douglass R. Cutting

发明人： Jan O. Pedersen , David R. Karger , Douglass R. Cutting

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , G06F17/30011 , Y10S707/99932

摘要： Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument into a plurality of subsequent metadocuments. The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

摘要翻译： 通过将具有至少一个初始元文件的焦点集扩展到多个后续元文件来处理任意大的文档集合。随后的元文件数量大约等于预定的最大数量。随后的元文件然后被聚集成预定数量的新的元文件，其被汇总并呈现给用户。焦点集被重新定义为仅包括用户选择的新的元文件。

5.

发明授权
Scatter-gather: a cluster-based method and apparatus for browsing large document collections 失效
标题翻译：散点收集：用于浏览大型文档集合的基于群集的方法和设备

公开(公告)号：US5442778A

公开(公告)日：1995-08-15

申请号：US790316

申请日：1991-11-12

申请人： Jan. O. Pedersen , David Karger , Douglass R. Cutting , John W. Tukey

发明人： Jan. O. Pedersen , David Karger , Douglass R. Cutting , John W. Tukey

IPC分类号： G06F17/30

CPC分类号： G06F17/3071 , G06F17/30011 , Y10S707/99935 , Y10S707/99937

摘要： Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

摘要翻译： Scatter-Gather是一种基于计算机的文档浏览方法，与目标语料库中的文档数量成正比。分散收集方法包括：使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式实用程序的语料库的初始排序的摘要; 并使用例如在线非确定性方法提供语料库的进一步排序。离线准备语料库的初始排序的步骤是非时间依赖的，因此准备了准确的初始排序。确定摘要的步骤包括确定用于呈现给用户的摘要，而不在CRT上滚动。提供进一步排序的步骤包括截断组平均聚集聚类，合并不相关文档集，中心查找，分配到最近和其他细化方法。

6.

发明授权
Iterative technique for phrase query formation and an information retrieval system employing same 失效
标题翻译：用于短语查询形成的迭代技术和采用它的信息检索系统

公开(公告)号：US5278980A

公开(公告)日：1994-01-11

申请号：US745794

申请日：1991-08-16

申请人： Jan O. Pedersen , Per-Kristian Halvorsen , Douglass R. Cutting , John W. Tukey , Eric A. Bier , Daniel G. Bobrow

发明人： Jan O. Pedersen , Per-Kristian Halvorsen , Douglass R. Cutting , John W. Tukey , Eric A. Bier , Daniel G. Bobrow

IPC分类号： G06F17/30 , G06F15/40 , G06F15/403

CPC分类号： G06F17/30646 , G06F17/30011 , Y10S707/99934

摘要： An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.

摘要翻译： 提供了一种信息检索系统和方法，其中操作者输入用于确定用于通过文档语料库搜索的搜索关键字的一个或多个查询词，并且将搜索关键字和文档语料库之间的任何匹配返回为包含与查询字匹配的词数据，与匹配字数据相邻的不停（内容）字，以及匹配字数据与下一相邻不停字之间的所有中间停止字的短语。操作者在查看一个或多个返回的短语之后，可以使用下一个相邻的非停止词中的一个或多个作为新的查询词来重新组合搜索关键字，并通过文档语料库执行后续搜索。这个过程可以迭代进行，直到找到相关文档。来自每个短语的附加非停止词优选彼此对齐（例如，通过列），以便于观看“新”内容词。

7.

发明授权
Automatic method of extracting summarization using feature probabilities 失效
标题翻译：使用特征概率提取摘要的自动方法

公开(公告)号：US5918240A

公开(公告)日：1999-06-29

申请号：US495986

申请日：1995-06-28

申请人： Julian M. Kupiec , Jan O. Pedersen , Francine R. Chen , Daniel C. Brotsky , Steven B. Putz

发明人： Julian M. Kupiec , Jan O. Pedersen , Francine R. Chen , Daniel C. Brotsky , Steven B. Putz

IPC分类号： G06F17/21 , G06F17/27 , G06F17/30

CPC分类号： G06F17/30719

摘要： A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.

摘要翻译： 自动生成文档提取的方法。该方法利用从手动生成的摘要的统计分析产生的特征值概率来提取专家可能的同一组句子。该方法基于迭代方法。首先，计算机系统将文档的句子指定为所选择的句子。第二，计算机系统确定特征集的每个特征的所选择的句子的值。第三，计算机系统基于所选择的句子的特征值以及与该值相关联的概率来增加所选句子的得分。第四，在对计算机系统的文档的所有句子进行评分之后，计算机系统选择要提取的最高得分句子的子集。

8.

发明授权
Method and apparatus for automatic document summarization 失效
标题翻译：自动文件摘要的方法和装置

公开(公告)号：US5638543A

公开(公告)日：1997-06-10

申请号：US71114

申请日：1993-06-03

申请人： Jan O. Pedersen , John W. Tukey

发明人： Jan O. Pedersen , John W. Tukey

IPC分类号： G06F17/21 , G06F17/30

CPC分类号： G06F17/30719

摘要： Regions of a document such as sentences and blocks of sentences are scored and classified based upon their scores. An abstract of the document can be formed from the classified sentences. Sentences are classified by the use of words classified as stop words and vanish words. Sentences are scored based on the number of stop words and the number of strings of connected stop words, called stop-word runs, contained in the sentence. Passionate sentences, which usually contain information which the writer has strong feelings about, such as joy, admiration, or sadness, are identified. This method can also select sentences that are contrapassionate, which the writer may either have to strengthen or have inserted to complete the record and provide continuity or information.

摘要翻译： 文档的区域，例如句子和句子块根据他们的分数得分和分类。文件的摘要可以由分类句子形成。句子通过使用分类为停止词和消失词的词来分类。根据句子中包含的停止词的数量和所连接的停止词的串数（称为停止词运行），对句子进行评分。确定了热情的句子，通常包含作者对喜悦，钦佩或悲伤等强烈感情的信息。这种方法还可以选择具有矛盾性的句子，作者可能必须加强或插入以完成记录并提供连续性或信息。

9.

发明授权
Finite-state transduction of related word forms for text indexing and retrieval 失效

公开(公告)号：US5625554A

公开(公告)日：1997-04-29

申请号：US916576

申请日：1992-07-20

申请人： Douglass R. Cutting , Per-Kristian G. Halvorsen , Ronald M. Kaplan , Lauri Karttunen , Martin Kay , Jan O. Pedersen

发明人： Douglass R. Cutting , Per-Kristian G. Halvorsen , Ronald M. Kaplan , Lauri Karttunen , Martin Kay , Jan O. Pedersen

IPC分类号： G06F17/30

CPC分类号： G06F17/30616 , G06F17/30663 , G06F17/30666 , G06F17/30672 , G06F17/30684 , G06F17/30985 , G06F17/30988 , Y10S707/99931

摘要： The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The techniques described apply generally across the languages of the world and are not just limited to simple suffixing languages like English. Although the resulting transducers can have many states and transitions or arcs, they can be compacted by finite-state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel finite state transducer as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.

10.

发明授权
Hardcopy lossless data storage and communications for electronic document processing systems 失效
标题翻译：用于电子文档处理系统的硬拷贝无损数据存储和通信

公开(公告)号：US5486686A

公开(公告)日：1996-01-23

申请号：US887563

申请日：1992-05-18

申请人： Frank Zdybel, Jr. , Henry W. Sang, Jr. , Jan O. Pedersen , Z. E. Smith, III , D. A. Henderson, Jr. , David L. Hecht , Dan S. Bloomberg

发明人： Frank Zdybel, Jr. , Henry W. Sang, Jr. , Jan O. Pedersen , Z. E. Smith, III , D. A. Henderson, Jr. , David L. Hecht , Dan S. Bloomberg

IPC分类号： G06F17/21 , G06F17/30 , G06F21/24 , G06Q10/10 , H04N1/32 , G06F15/20

CPC分类号： H04N1/32133 , G06F17/30011 , G06Q10/10 , H04N2201/3204 , H04N2201/3205 , H04N2201/3214 , H04N2201/3226 , H04N2201/3232 , H04N2201/3233 , H04N2201/3242 , H04N2201/3269 , H04N2201/3271

摘要： Machine readable electronic domain definitions of part or all of the electronic domain descriptions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and/or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to the electronic domain.

摘要翻译： 用于制作和复制这些复印件的硬拷贝文件和/或部分或全部变换的部分或全部电子域描述的机读电子域定义以打印在这些文件上的代码进行编码，从而允许当这些文件所携带的信息从硬拷贝域转变为电子域时，对这些文档和/或这种转换的电子域描述被更加鲁棒和可靠地恢复。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类