System and method for document collection, grouping and summarization
    2.
    发明授权
    System and method for document collection, grouping and summarization 有权
    文件收集,分组和总结的系统和方法

    公开(公告)号:US08176418B2

    公开(公告)日:2012-05-08

    申请号:US11071968

    申请日:2005-03-04

    IPC分类号: G06F17/00

    CPC分类号: G06Q10/10

    摘要: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.

    摘要翻译: 提供了一种用于生成多个文档的摘要并向用户呈现摘要信息的系统,其包括包含以电子形式存储的多个相关文档的计算机可读文档集合。 可以对文档进行预处理,将文档分组成文档集群。 文档集群也可以被分配给预定的文档类别以呈现给用户。 提供了多个多个文档摘要引擎,为多个文档集群的特定类生成摘要。 采用汇总器路由器来确定集群中的文档的关系,并选择文档摘要引擎之一用于生成集群的摘要。 提供单个事件引擎来生成与时间上紧密相关的特定事件的文档的摘要。 提供了用于多文档摘要生成的不相似引擎,其产生具有不同程度相关性的文档的文档集合的摘要。 提供用户界面来显示类别,集群标题,摘要,相关图像。

    System and method for document collection, grouping and summarization
    3.
    发明申请
    System and method for document collection, grouping and summarization 有权
    文件收集,分组和总结的系统和方法

    公开(公告)号:US20050203970A1

    公开(公告)日:2005-09-15

    申请号:US11071968

    申请日:2005-03-04

    IPC分类号: G06F7/00 G06F15/00 G06Q10/00

    CPC分类号: G06Q10/10

    摘要: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.

    摘要翻译: 提供了一种用于生成多个文档的摘要并向用户呈现摘要信息的系统,其包括包含以电子形式存储的多个相关文档的计算机可读文档集合。 可以对文档进行预处理,将文档分组成文档集群。 文档集群也可以被分配给预定的文档类别以呈现给用户。 提供了多个多个文档摘要引擎,为多个文档集群的特定类生成摘要。 采用汇总器路由器来确定集群中的文档的关系,并选择文档摘要引擎之一用于生成集群的摘要。 提供单个事件引擎来生成与时间上紧密相关的特定事件的文档的摘要。 提供了用于多文档摘要生成的不相似引擎,其产生具有不同程度相关性的文档的文档集合的摘要。 提供用户界面来显示类别,集群标题,摘要,相关图像。

    System and method of generating dictionary entries
    5.
    发明申请
    System and method of generating dictionary entries 有权
    生成字典条目的系统和方法

    公开(公告)号:US20050234709A1

    公开(公告)日:2005-10-20

    申请号:US10398535

    申请日:2002-09-26

    IPC分类号: G06F17/21 G06F17/27

    CPC分类号: G06F17/2735

    摘要: A system for automatically generating a dictionary from full text articles extracts pairs from full text articles and stores the pairs as dictionary entries. The system includes a computer readable corpus having a plurality of documents therein. A pattern processing module (120) and a grammar processing module (125) are provided for extracting pairs from the corpus and storing the pairs in a dictionary database (145). A routing processing module selectively routes sentences in the corpus to at least one of the pattern processing module or grammar processing module. In one embodiment, the routing module is incorporated into the pattern processing module which then selectively routes a portion of the sentences to the grammar processing module. A bootstrapping processing module (150) can be used to apply entries against the corpus to identify and extract additional entries.

    摘要翻译: 用于从全文文本自动生成字典的系统从全文文章中提取对,并将对存储为字典条目。 该系统包括其中具有多个文件的计算机可读语料库。 提供模式处理模块(120)和语法处理模块(125),用于从语料库中提取对,并将对存储在字典数据库(145)中。 路由处理模块选择性地将语料库中的句子路由到模式处理模块或语法处理模块中的至少一个。 在一个实施例中,路由模块被并入到模式处理模块中,该模式处理模块然后选择性地将一部分句子路由到语法处理模块。 引导处理模块(150)可用于对语料库应用条目以识别和提取附加的术语,定义>条目。