Scatter-gather: a cluster-based method and apparatus for browsing large
document collections
    1.
    发明授权
    Scatter-gather: a cluster-based method and apparatus for browsing large document collections 失效
    散点收集:用于浏览大型文档集合的基于群集的方法和设备

    公开(公告)号:US5442778A

    公开(公告)日:1995-08-15

    申请号:US790316

    申请日:1991-11-12

    IPC分类号: G06F17/30

    摘要: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

    摘要翻译: Scatter-Gather是一种基于计算机的文档浏览方法,与目标语料库中的文档数量成正比。 分散收集方法包括:使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式实用程序的语料库的初始排序的摘要; 并使用例如在线非确定性方法提供语料库的进一步排序。 离线准备语料库的初始排序的步骤是非时间依赖的,因此准备了准确的初始排序。 确定摘要的步骤包括确定用于呈现给用户的摘要,而不在CRT上滚动。 提供进一步排序的步骤包括截断组平均聚集聚类,合并不相关文档集,中心查找,分配到最近和其他细化方法。