Method of ordering document clusters without requiring knowledge of user
interests
    1.
    发明授权
    Method of ordering document clusters without requiring knowledge of user interests 失效
    在不需要用户兴趣的知识的情况下排序文档集群的方法

    公开(公告)号:US5787420A

    公开(公告)日:1998-07-28

    申请号:US572558

    申请日:1995-12-14

    IPC分类号: G06F17/30

    摘要: A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, each having a centroid and including at least one document. Next, for each cluster a degree of similarity between that cluster and every other cluster is by finding a dot product between each cluster centroid and every other cluster centroid. The similarity information is then used to determine an order of presentation for the plurality of in a way that maximizes the degree of similarity between adjacent clusters.

    摘要翻译: 在没有计算机用户的兴趣的任何指示的情况下,在浏览了以逻辑方式呈现文档簇的文档的语料库之后,排序文档簇以进行呈现的计算机化方法。 该方法开始于将语料库分组成多个簇,每个簇具有质心并且包括至少一个文档。 接下来,对于每个集群,该集群和每个其他集群之间的相似程度通过在每个集群质心和每个其他集群质心之间找到点积。 然后,相似性信息用于以使相邻集群之间的相似度最大化的方式来确定多个呈现的顺序。

    Article and method of automatically filtering information retrieval results using test genre
    3.
    发明授权
    Article and method of automatically filtering information retrieval results using test genre 失效
    使用测试类型自动过滤信息检索结果的文章和方法

    公开(公告)号:US06505150B2

    公开(公告)日:2003-01-07

    申请号:US09100201

    申请日:1998-06-18

    IPC分类号: G10L1720

    摘要: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

    摘要翻译: 根据文本进行过滤的方法类型是对未标记的机器可读文本的异构语料库的主题搜索的结果。 因为语料库的每个文本都有一个主题和一个文本类型,所以语料库包含多个文本类型并涵盖多个主题。 根据该方法,处理器首先在语料库中搜索具有第一主题的第一多个文本。 接下来,处理器识别作为第一文本类型的实例的第一多重性的第一组文本,并且识别作为第二文本类型的实例的第一多重性的第二组文本。 最后,处理器基于第一文本类型和第二文本类型向计算机用户标识第一多个文本。

    Method of constant interaction-time clustering applied to document
browsing
    4.
    发明授权
    Method of constant interaction-time clustering applied to document browsing 失效
    不断的交互时间聚类方法应用于文档浏览

    公开(公告)号:US5483650A

    公开(公告)日:1996-01-09

    申请号:US79292

    申请日:1993-06-21

    IPC分类号: G06F17/30

    摘要: Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument into a plurality of subsequent metadocuments. The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

    摘要翻译: 通过将具有至少一个初始元文件的焦点集扩展到多个后续元文件来处理任意大的文档集合。 随后的元文件数量大约等于预定的最大数量。 随后的元文件然后被聚集成预定数量的新的元文件,其被汇总并呈现给用户。 焦点集被重新定义为仅包括用户选择的新的元文件。

    Scatter-gather: a cluster-based method and apparatus for browsing large
document collections
    5.
    发明授权
    Scatter-gather: a cluster-based method and apparatus for browsing large document collections 失效
    散点收集:用于浏览大型文档集合的基于群集的方法和设备

    公开(公告)号:US5442778A

    公开(公告)日:1995-08-15

    申请号:US790316

    申请日:1991-11-12

    IPC分类号: G06F17/30

    摘要: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

    摘要翻译: Scatter-Gather是一种基于计算机的文档浏览方法,与目标语料库中的文档数量成正比。 分散收集方法包括:使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式实用程序的语料库的初始排序的摘要; 并使用例如在线非确定性方法提供语料库的进一步排序。 离线准备语料库的初始排序的步骤是非时间依赖的,因此准备了准确的初始排序。 确定摘要的步骤包括确定用于呈现给用户的摘要,而不在CRT上滚动。 提供进一步排序的步骤包括截断组平均聚集聚类,合并不相关文档集,中心查找,分配到最近和其他细化方法。

    Iterative technique for phrase query formation and an information
retrieval system employing same
    6.
    发明授权
    Iterative technique for phrase query formation and an information retrieval system employing same 失效
    用于短语查询形成的迭代技术和采用它的信息检索系统

    公开(公告)号:US5278980A

    公开(公告)日:1994-01-11

    申请号:US745794

    申请日:1991-08-16

    摘要: An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.

    摘要翻译: 提供了一种信息检索系统和方法,其中操作者输入用于确定用于通过文档语料库搜索的搜索关键字的一个或多个查询词,并且将搜索关键字和文档语料库之间的任何匹配返回为 包含与查询字匹配的词数据,与匹配字数据相邻的不停(内容)字,以及匹配字数据与下一相邻不停字之间的所有中间停止字的短语。 操作者在查看一个或多个返回的短语之后,可以使用下一个相邻的非停止词中的一个或多个作为新的查询词来重新组合搜索关键字,并通过文档语料库执行后续搜索。 这个过程可以迭代进行,直到找到相关文档。 来自每个短语的附加非停止词优选彼此对齐(例如,通过列),以便于观看“新”内容词。

    Automatic method of extracting summarization using feature probabilities
    7.
    发明授权
    Automatic method of extracting summarization using feature probabilities 失效
    使用特征概率提取摘要的自动方法

    公开(公告)号:US5918240A

    公开(公告)日:1999-06-29

    申请号:US495986

    申请日:1995-06-28

    IPC分类号: G06F17/21 G06F17/27 G06F17/30

    CPC分类号: G06F17/30719

    摘要: A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.

    摘要翻译: 自动生成文档提取的方法。 该方法利用从手动生成的摘要的统计分析产生的特征值概率来提取专家可能的同一组句子。 该方法基于迭代方法。 首先,计算机系统将文档的句子指定为所选择的句子。 第二,计算机系统确定特征集的每个特征的所选择的句子的值。 第三,计算机系统基于所选择的句子的特征值以及与该值相关联的概率来增加所选句子的得分。 第四,在对计算机系统的文档的所有句子进行评分之后,计算机系统选择要提取的最高得分句子的子集。

    Method and apparatus for automatic document summarization
    8.
    发明授权
    Method and apparatus for automatic document summarization 失效
    自动文件摘要的方法和装置

    公开(公告)号:US5638543A

    公开(公告)日:1997-06-10

    申请号:US71114

    申请日:1993-06-03

    IPC分类号: G06F17/21 G06F17/30

    CPC分类号: G06F17/30719

    摘要: Regions of a document such as sentences and blocks of sentences are scored and classified based upon their scores. An abstract of the document can be formed from the classified sentences. Sentences are classified by the use of words classified as stop words and vanish words. Sentences are scored based on the number of stop words and the number of strings of connected stop words, called stop-word runs, contained in the sentence. Passionate sentences, which usually contain information which the writer has strong feelings about, such as joy, admiration, or sadness, are identified. This method can also select sentences that are contrapassionate, which the writer may either have to strengthen or have inserted to complete the record and provide continuity or information.

    摘要翻译: 文档的区域,例如句子和句子块根据他们的分数得分和分类。 文件的摘要可以由分类句子形成。 句子通过使用分类为停止词和消失词的词来分类。 根据句子中包含的停止词的数量和所连接的停止词的串数(称为停止词运行),对句子进行评分。 确定了热情的句子,通常包含作者对喜悦,钦佩或悲伤等强烈感情的信息。 这种方法还可以选择具有矛盾性的句子,作者可能必须加强或插入以完成记录并提供连续性或信息。