Method of searching similar document, system for performing the same and program for processing the same
    1.
    发明授权
    Method of searching similar document, system for performing the same and program for processing the same 失效
    搜索类似文档的方法,执行相同的系统和处理程序的方法

    公开(公告)号:US07200587B2

    公开(公告)日:2007-04-03

    申请号:US10081203

    申请日:2002-02-25

    IPC分类号: G06F17/30 G06F17/00

    摘要: A similar document search method includes a step of extracting a characteristic word candidate as a candidate for a characteristic word from a seeds document including desired retrieval contents, a step of extracting as characteristic words of the seeds document, when the characteristic word candidate extracted by the extracting step is a compound characteristic word including a plurality of characteristic words, the compound characteristic word and constituent characteristic words included in the compound characteristic word from the characteristic word candidate, a step of calculating, according to the characteristic words extracted by the extracting step, similarity between the seeds document and a registration document, and a step of outputting as a retrieval result a result of the similarity calculated by the similarity calculating step.

    摘要翻译: 类似的文档搜索方法包括从包括期望的检索内容的种子文档中提取特征词候选作为特征词的候选的步骤,当由所述特征词候选提取的特征词候选提取时,提取种子文档的特征词的步骤 提取步骤是包括多个特征词的复合特征词,来自特征词候选的复合特征词中包括的复合特征词和构成特征词,根据由提取步骤提取的特征词计算的步骤, 种子文档和登记文档之间的相似性,以及作为检索结果输出由相似度计算步骤计算出的相似度的结果的步骤。

    Method, program and apparatus for document retrieval system
    2.
    发明授权
    Method, program and apparatus for document retrieval system 有权
    文件检索系统的方法,程序和装置

    公开(公告)号:US07620614B2

    公开(公告)日:2009-11-17

    申请号:US11625983

    申请日:2007-01-23

    IPC分类号: G06F17/30 G06F12/00

    摘要: The present invention realize a high speed retrieval performance in a document retrieval system referring to partial data of documents including structured data such as XML documents and electric mails, without providing further memory. The present invention includes storage means for storing documents to be retrieved onto a disk device, a calculation means for calculating an allocated capacity of the memory, and storage means for saving, onto the memory, partial data of the documents stored on the disk device by the calculated allocated capacity of the memory. The present invention also includes a first retrieval means for retrieving partial data stored on the memory, determining means for determining whether or not to retrieve the documents stored on the disk device based on the result from the first retrieval, and a second means for retrieving the documents stored on the disk device based on the result from the above determination.

    摘要翻译: 本发明在文件检索系统中实现高速检索性能,参考包括诸如XML文档和电子邮件的结构化数据的文档的部分数据,而不提供进一步的存储器。 本发明包括用于存储要被检索到盘装置上的文件的存储装置,用于计算存储器的分配容量的计算装置,以及存储装置,用于将存储在盘装置上的文件的部分数据保存在存储器中 计算出的内存分配容量。 本发明还包括用于检索存储在存储器上的部分数据的第一检索装置,用于基于第一检索的结果来确定是否检索存储在磁盘装置上的文档的确定装置,以及用于检索 基于上述确定的结果存储在磁盘设备上的文档。

    Method, program and apparatus for document retrieval system
    3.
    发明申请
    Method, program and apparatus for document retrieval system 有权
    文件检索系统的方法,程序和装置

    公开(公告)号:US20070192274A1

    公开(公告)日:2007-08-16

    申请号:US11625983

    申请日:2007-01-23

    IPC分类号: G06F17/30

    摘要: The present invention realize a high speed retrieval performance in a document retrieval system referring to partial data of documents including structured data such as XML documents and electric mails, without providing further memory. The present invention includes storage means for storing documents to be retrieved onto a disk device, a calculation means for calculating an allocated capacity of the memory, and storage means for saving, onto the memory, partial data of the documents stored on the disk device by the calculated allocated capacity of the memory. The present invention also includes a first retrieval means for retrieving partial data stored on the memory, determining means for determining whether or not to retrieve the documents stored on the disk device based on the result from the first retrieval, and a second means for retrieving the documents stored on the disk device based on the result from the above determination.

    摘要翻译: 本发明在文件检索系统中实现高速检索性能,参考包括诸如XML文档和电子邮件的结构化数据的文档的部分数据,而不提供进一步的存储器。 本发明包括用于存储要被检索到盘装置上的文件的存储装置,用于计算存储器的分配容量的计算装置,以及存储装置,用于将存储在盘装置上的文件的部分数据保存在存储器中 计算出的内存分配容量。 本发明还包括用于检索存储在存储器上的部分数据的第一检索装置,用于基于第一检索的结果来确定是否检索存储在磁盘装置上的文档的确定装置,以及用于检索 基于上述确定的结果存储在磁盘设备上的文档。