Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace
    1.
    发明授权
    Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace 有权
    用于信息检索和利用多维子空间的文档分类的方法,装置和计算机程序产品

    公开(公告)号:US06701305B1

    公开(公告)日:2004-03-02

    申请号:US09693114

    申请日:2000-10-20

    IPC分类号: G06F1700

    摘要: Methods, apparatus and computer program products are provided for retrieving information from a text data collection and for classifying a document into none, one or more of a plurality of predefined classes. In each aspect, a representation of at least a portion of the original matrix is projected into a lower dimensional subspace and those portions of the subspace representation that relate to the term(s) of the query are weighted following the projection into the lower dimensional subspace. In order to retrieve the documents that are most relevant with respect to a query, the documents are then scored with documents having better scores being of generally greater relevance. Alternatively, in order to classify a document, the relationship of the document to the classes of documents is scored with the document then being classified in those classes, if any, that have the best scores.

    摘要翻译: 提供了方法,装置和计算机程序产品,用于从文本数据收集中检索信息,并将文档分类为多个预定类别中的一个或多个。 在每个方面,原始矩阵的至少一部分的表示被投影到较低维子空间中,并且与查询的项相关的子空间表示的那些部分被加权后跟随投影到较低维子空间中 。 为了检索与查询最相关的文档,然后使用具有更好分数的文档具有更大的相关性的文档进行评分。 或者,为了对文档进行分类,将文档与文档类的关系进行评分,然后将文档分类为具有最佳分数的那些类别(如果有的话)。