System and method for document collection, grouping and summarization
    2.
    发明授权
    System and method for document collection, grouping and summarization 有权
    文件收集,分组和总结的系统和方法

    公开(公告)号:US08176418B2

    公开(公告)日:2012-05-08

    申请号:US11071968

    申请日:2005-03-04

    IPC分类号: G06F17/00

    CPC分类号: G06Q10/10

    摘要: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.

    摘要翻译: 提供了一种用于生成多个文档的摘要并向用户呈现摘要信息的系统,其包括包含以电子形式存储的多个相关文档的计算机可读文档集合。 可以对文档进行预处理,将文档分组成文档集群。 文档集群也可以被分配给预定的文档类别以呈现给用户。 提供了多个多个文档摘要引擎,为多个文档集群的特定类生成摘要。 采用汇总器路由器来确定集群中的文档的关系,并选择文档摘要引擎之一用于生成集群的摘要。 提供单个事件引擎来生成与时间上紧密相关的特定事件的文档的摘要。 提供了用于多文档摘要生成的不相似引擎,其产生具有不同程度相关性的文档的文档集合的摘要。 提供用户界面来显示类别,集群标题,摘要,相关图像。

    Multi-document summarization system and method
    3.
    发明授权
    Multi-document summarization system and method 有权
    多文档摘要系统和方法

    公开(公告)号:US07366711B1

    公开(公告)日:2008-04-29

    申请号:US09913745

    申请日:2000-02-18

    IPC分类号: G06F17/30

    摘要: A summary for a collection of related documents can be generated by extracting phrases from the documents which include common focus elements. Phrase intersection analysis is then performed on the extracted phrases to generate a phrase intersection table, where identical or equivalent phrases are identified. Temporal processing on the phrases in the phrase intersection table is performed to remove ambiguous time references and to sort the phrases in a temporal sequence. Sentence generation is then used to combine the phrases in the phrase intersection table into a coherent summary.

    摘要翻译: 可以通过从包括公共焦点元素的文档中提取短语来生成相关文档集合的摘要。 然后对所提取的短语进行短语交点分析,以生成短语交集表,其中识别相同或等同的短语。 执行短语交叉表中的短语的时间处理以消除模糊的时间参考并且以时间顺序对短语进行排序。 然后使用句子生成将短语交集表中的短语组合成一致的总结。

    Method and system for topical segmentation, segment significance and segment function
    5.
    发明授权
    Method and system for topical segmentation, segment significance and segment function 失效
    局部分割方法和系统,分段意义和分段功能

    公开(公告)号:US06473730B1

    公开(公告)日:2002-10-29

    申请号:US09290643

    申请日:1999-04-12

    IPC分类号: G06F1727

    摘要: A “domain-general” method for topical segmentation of a document input includes the steps of: extracting one or more selected terms from a document; linking occurrences of the extracted terms based upon the proximity of similar terms; and assigning weighted scores to paragraphs of the document input corresponding to the linked occurrences. In accordance with the present invention, the values of the assigned scores depend upon the type of the selected terms, e.g., common noun, proper noun, pronominal, and the position of the linked occurrences with respect to the paragraphs, e.g., front, during, rear, etc. Upon zero-sum normalization, the assigned scores represent the boundaries of the topical segments of the document input.

    摘要翻译: 用于文档输入的局部分割的“一般”方法包括以下步骤:从文档中提取一个或多个所选项; 基于类似术语的接近度连接提取的术语的出现; 并将加权分数分配给对应于链接事件的文档输入的段落。 根据本发明,所分配的分数的值取决于所选项的类型,例如,常用名词,专有名词,代词和相对于段落的链接事件的位置,例如在前面,在 ,后方等。在零和标准化时,分配的分数表示文档输入的主题段的边界。