SYSTEM AND METHOD FOR DOCUMENT SECTION SEGMENTATION
    1.
    发明申请
    SYSTEM AND METHOD FOR DOCUMENT SECTION SEGMENTATION 有权
    用于文件部分分类的系统和方法

    公开(公告)号:US20080059498A1

    公开(公告)日:2008-03-06

    申请号:US11851871

    申请日:2007-09-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: A system and method for facilitating the processing and the use of documents by providing a system for categorizing document section headings under a set of canonical section headings. In the method for categorizing section headings, there may be a process of training a database and matching methods to categorize different but equivalent document section headings under canonical headings and categories. Once trained, the system may match and categorize the document sections with little to no supervision of the categorization for large sets of documents.

    摘要翻译: 一种系统和方法,用于通过提供一系列用于将文档部分标题分类在一组规范部分标题下的方式来促进文件的处理和使用。 在分类标题的分类方法中,可能会有一个培训数据库和匹配方法的过程,以将规范标题和类别下的不同但等效的文档部分标题分类。 一旦进行了培训,系统可以对文档部分进行匹配和分类,而对大型文档的分类几乎没有监督。

    Natural language information retrieval system and method
    2.
    发明授权
    Natural language information retrieval system and method 失效
    自然语言信息检索系统及方法

    公开(公告)号:US6081774A

    公开(公告)日:2000-06-27

    申请号:US916628

    申请日:1997-08-22

    IPC分类号: G06F17/30 G06F17/27 G06F7/00

    摘要: An information retrieval system that represents the content of a language-based database being searched as well as the user's natural language query. In accordance with one aspect of the invention, the information retrieval system includes a non-real-time development system for automatically creating a database index having one or more content-based database keywords of the data base; and a real-time retrieval system that, in response to a user's natural language query, searches the keyword index for one or more content-based query keywords derived from the natural language query. The development system and the retrieval system morphologically, syntactically and linguistically analyze the data base and the natural language query, respectively, to generate the one or more database keywords and query keywords representing the content of the database and the natural language query, respectively. The development system includes a software development system for creating the database index utilizing a pattern dictionary that includes synonyms and skip words and a morphosyntactic dictionary that includes morphological and syntactic information for words in the natural language of the language-based database and the natural language query. In one embodiment, the retrieval system includes a natural language interface system for creating the one or more query keywords utilizing the pattern dictionary and the morphosyntactic dictionary. In one embodiment, the retrieval system also includes a query-index matcher for matching the one or more query keywords with the one or more database keywords.

    摘要翻译: 一种信息检索系统,其表示正在搜索的基于语言的数据库的内容以及用户的自然语言查询。 根据本发明的一个方面,信息检索系统包括用于自动创建具有数据库的一个或多个基于内容的数据库关键字的数据库索引的非实时开发系统; 以及实时检索系统,其响应于用户的自然语言查询,从关键字索引中搜索从自然语言查询导出的一个或多个基于内容的查询关键字。 开发系统和检索系统在形态上,语法和语言上分别分析数据库和自然语言查询,分别生成一个或多个数据库关键字和查询表示数据库内容和自然语言查询的关键字。 该开发系统包括一个软件开发系统,用于使用包括同义词和跳跃词的模式字典创建数据库索引,以及包含基于语言的数据库和自然语言查询的自然语言中的单词的形态和句法信息的形态词典 。 在一个实施例中,检索系统包括自然语言界面系统,用于利用模式词典和形象词典创建一个或多个查询关键字。 在一个实施例中,检索系统还包括用于将一个或多个查询关键字与一个或多个数据库关键字进行匹配的查询索引匹配器。

    System and method for document section segmentation
    4.
    发明申请
    System and method for document section segmentation 审中-公开
    文档部分分割的系统和方法

    公开(公告)号:US20050144184A1

    公开(公告)日:2005-06-30

    申请号:US10953448

    申请日:2004-09-30

    IPC分类号: G06F17/00

    CPC分类号: G06F17/2745

    摘要: A system and method for facilitating the processing and the use of documents by providing a system for categorizing document section headings under a set of canonical section headings. In the method for categorizing section headings, there may be a process of training a database and matching methods to categorize different but equivalent document section headings under canonical headings and categories. Once trained the system may match and categorize the document sections with little to no supervision of the categorization for large sets of documents.

    摘要翻译: 一种系统和方法,用于通过提供一系列用于将文档部分标题分类在一组规范部分标题下的方式来促进文件的处理和使用。 在分类标题的分类方法中,可能会有一个培训数据库和匹配方法的过程,以将规范标题和类别下的不同但等效的文档部分标题分类。 一旦进行了培训,系统可以对文档部分进行匹配和分类,对大量文档的分类几乎不进行监督。

    System and method for document section segmentation
    5.
    发明授权
    System and method for document section segmentation 有权
    文档部分分割的系统和方法

    公开(公告)号:US07818308B2

    公开(公告)日:2010-10-19

    申请号:US11851871

    申请日:2007-09-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: A system and method for facilitating the processing and the use of documents by providing a system for categorizing document section headings under a set of canonical section headings. In the method for categorizing section headings, there may be a process of training a database and matching methods to categorize different but equivalent document section headings under canonical headings and categories. Once trained, the system may match and categorize the document sections with little to no supervision of the categorization for large sets of documents.

    摘要翻译: 一种系统和方法,用于通过提供一系列用于将文档部分标题分类在一组规范部分标题下的方式来促进文件的处理和使用。 在分类标题的分类方法中,可能会有一个培训数据库和匹配方法的过程,以将规范标题和类别下的不同但等效的文档部分标题分类。 一旦进行了培训,系统可以对文档部分进行匹配和分类,而对大型文档的分类几乎不进行监督。