SYSTEM AND METHOD FOR PERFORMING ELECTRONIC INFORMATION RETRIEVAL USING KEYWORDS
    3.
    发明申请
    SYSTEM AND METHOD FOR PERFORMING ELECTRONIC INFORMATION RETRIEVAL USING KEYWORDS 有权
    使用关键词执行电子信息检索的系统和方法

    公开(公告)号:US20050086205A1

    公开(公告)日:2005-04-21

    申请号:US10605630

    申请日:2003-10-15

    IPC分类号: G06F17/30

    摘要: Output documents similar to an input document are identified. A query is formulated using a list of best keywords from the input document to search for a first set of output documents. The list of best keywords is defined with a maximum number of keywords less than the total number of keywords in the list of best keywords that are identified as belonging to a domain specific dictionary of words and as having no measurable linguistic frequency. Lists of keywords are identified for each output document in the first set of documents. A second set of similar documents is determined using a measure of similarity that is computed between keywords identified in the input document and each output document in the first set of documents.

    摘要翻译: 识别与输入文档类似的输出文档。 使用输入文档中的最佳关键字列表来制定查询,以搜索第一组输出文档。 最佳关键字的列表定义为最少数量的关键字少于被标识为属于特定字词典并且没有可衡量的语言频率的最佳关键字列表中的关键字总数。 在第一组文档中为每个输出文档标识关键字列表。 使用在输入文档中标识的关键字和第一组文档中的每个输出文档之间计算的相似度测量来确定第二组相似文档。

    System and method for performing electronic information retrieval using keywords
    4.
    发明授权
    System and method for performing electronic information retrieval using keywords 有权
    使用关键字执行电子信息检索的系统和方法

    公开(公告)号:US07370034B2

    公开(公告)日:2008-05-06

    申请号:US10605630

    申请日:2003-10-15

    IPC分类号: G06F7/00 G06F17/30

    摘要: Output documents similar to an input document are identified. A query is formulated using a list of best keywords from the input document to search for a first set of output documents. The list of best keywords is defined with a maximum number of keywords less than the total number of keywords in the list of best keywords that are identified as belonging to a domain specific dictionary of words and as having no measurable linguistic frequency. Lists of keywords are identified for each output document in the first set of documents. A second set of similar documents is determined using a measure of similarity that is computed between keywords identified in the input document and each output document in the first set of documents.

    摘要翻译: 识别与输入文档类似的输出文档。 使用输入文档中的最佳关键字列表来制定查询,以搜索第一组输出文档。 最佳关键字的列表定义为最少数量的关键字少于被标识为属于特定字词典并且没有可衡量的语言频率的最佳关键字列表中的关键字总数。 在第一组文档中为每个输出文档标识关键字列表。 使用在输入文档中标识的关键字和第一组文档中的每个输出文档之间计算的相似度测量来确定第二组相似文档。

    System and method for computing a measure of similarity between documents
    5.
    发明授权
    System and method for computing a measure of similarity between documents 有权
    用于计算文档之间相似度的系统和方法

    公开(公告)号:US07493322B2

    公开(公告)日:2009-02-17

    申请号:US10605631

    申请日:2003-10-15

    IPC分类号: G06F7/00 G06F17/30

    摘要: A measure of similarity between two documents is computed. In computing the measure of similarity, a first list of rated keywords extracted from the first document and a second list of rated keywords extracted from the second document are received. The first and second lists of keywords are used to determine whether the first document forms part of the second document using a first computed percentage indicating what percentage of keyword ratings in the first list also exist in the second list. A second percentage is computed that indicates what percentage of keyword ratings along with a set of their neighboring keyword ratings in the first list that also exist in the second list when the first percentage indicates that the first document is included in the second document. The first percentage is used to specify the measure of similarity when the second percentage is greater than the first percentage.

    摘要翻译: 计算两个文档之间的相似性度量。 在计算相似性度量时,接收从第一文档提取的评级关键字的第一列表和从第二文档提取的第二列表。 关键字的第一和第二列表用于确定第一文档是否使用第一计算百分比形成第二文档的一部分,该百分比指示第二列表中还存在第一列表中的关键字评级的百分比。 计算第二百分比,其指示当第一百分比指示第一个文档包括在第二个文档中时,关键字评级的百分比以及第一个列表中也存在于第二个列表中的一组相邻关键字评级的百分比。 第一个百分比用于指定第二个百分比大于第一个百分比时的相似度度量。

    SYSTEM AND METHOD FOR COMPUTING A MEASURE OF SIMILARITY BETWEEN DOCUMENTS
    6.
    发明申请
    SYSTEM AND METHOD FOR COMPUTING A MEASURE OF SIMILARITY BETWEEN DOCUMENTS 有权
    用于计算文档之间相似度的系统和方法

    公开(公告)号:US20050086224A1

    公开(公告)日:2005-04-21

    申请号:US10605631

    申请日:2003-10-15

    IPC分类号: G06F17/30

    摘要: A measure of similarity between two documents is computed. In computing the measure of similarity, a first list of rated keywords extracted from the first document and a second list of rated keywords extracted from the second document are received. The first and second lists of keywords are used to determine whether the first document forms part of the second document using a first computed percentage indicating what percentage of keyword ratings in the first list also exist in the second list. A second percentage is computed that indicates what percentage of keyword ratings along with a set of their neighboring keyword ratings in the first list that also exist in the second list when the first percentage indicates that the first document is included in the second document. The first percentage is used to specify the measure of similarity when the second percentage is greater than the first percentage.

    摘要翻译: 计算两个文档之间的相似性度量。 在计算相似性度量时,接收从第一文档提取的评级关键字的第一列表和从第二文档提取的第二列表。 关键字的第一和第二列表用于确定第一文档是否使用第一计算百分比形成第二文档的一部分,该百分比指示第二列表中还存在第一列表中的关键字评级的百分比。 计算第二百分比,其指示当第一百分比指示第一个文档包括在第二个文档中时,关键字评级的百分比以及第一个列表中也存在于第二个列表中的一组相邻关键字评级的百分比。 第一个百分比用于指定第二个百分比大于第一个百分比时的相似度度量。