Generalized latent semantic analysis
    1.
    发明申请
    Generalized latent semantic analysis 有权
    广义潜在语义分析

    公开(公告)号:US20070067281A1

    公开(公告)日:2007-03-22

    申请号:US11228924

    申请日:2005-09-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675 G06F17/2715

    摘要: One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.

    摘要翻译: 本发明的一个实施例提供了构建关联张量(诸如矩阵)以便于文档和字级处理操作的系统。 在操作期间,系统使用文档集合中的术语来构建关联张量,其包含表示文档集合中的术语之间的成对相似性的值。 在此过程中,如果基于样本数量不足计算关联张量中的给定值,则系统从参考文档集合中确定相应的值,然后将相应的值替换为关联张量中的给定值。 在获得关联张量之后,应用维数降低方法来计算词汇项的低维向量空间表示。 文档向量被计算为项向量的线性组合。

    Generalized latent semantic analysis
    5.
    发明授权
    Generalized latent semantic analysis 有权
    广义潜在语义分析

    公开(公告)号:US08312021B2

    公开(公告)日:2012-11-13

    申请号:US11228924

    申请日:2005-09-16

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30675 G06F17/2715

    摘要: One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.

    摘要翻译: 本发明的一个实施例提供了构建关联张量(诸如矩阵)以便于文档和字级处理操作的系统。 在操作期间,系统使用文档集合中的术语来构建关联张量,其包含表示文档集合中的术语之间的成对相似性的值。 在此过程中,如果基于样本数量不足计算关联张量中的给定值,则系统从参考文档集合中确定相应的值,然后将相应的值替换为关联张量中的给定值。 在获得关联张量之后,应用维数降低方法来计算词汇项的低维向量空间表示。 文档向量被计算为项向量的线性组合。

    Ranking results using multiple nested ranking
    6.
    发明申请
    Ranking results using multiple nested ranking 有权
    使用多个嵌套排名排名结果

    公开(公告)号:US20060195440A1

    公开(公告)日:2006-08-31

    申请号:US11294269

    申请日:2005-12-05

    IPC分类号: G06F17/30

    摘要: A unique system and method that facilitates improving the ranking of items is provided. The system and method involve re-ranking decreasing subsets of high ranked items in separate stages. In particular, a basic ranking component can rank a set of items. A subset of the top or high ranking items can be taken and used as a new training set to train a component for improving the ranking among these high ranked documents. This process can be repeated on an arbitrary number of successive high ranked subsets. Thus, high ranked items can be reordered in separate stages by focusing on the higher ranked items to facilitate placing the most relevant items at the top of a search results list.

    摘要翻译: 提供了一种有助于提高项目排名的独特系统和方法。 该系统和方法包括在不同阶段重新排列高排名项目的减少子集。 特别地,基本排名组件可以对一组项目进行排序。 可以采用顶级或高级项目的一部分,并将其用作新的培训组,以训练组件以提高这些高排名文档中的排名。 该过程可以在任意数量的连续高排名子集上重复。 因此,通过关注较高排名的项目以便将最相关的项目放置在搜索结果列表的顶部,可以在单独的阶段重新排列高排名的项目。

    Ranking results using multiple nested ranking
    7.
    发明授权
    Ranking results using multiple nested ranking 有权
    使用多个嵌套排名排名结果

    公开(公告)号:US07689615B2

    公开(公告)日:2010-03-30

    申请号:US11294269

    申请日:2005-12-05

    IPC分类号: G06F7/00 G06F17/30

    摘要: A unique system and method that facilitates improving the ranking of items is provided. The system and method involve re-ranking decreasing subsets of high ranked items in separate stages. In particular, a basic ranking component can rank a set of items. A subset of the top or high ranking items can be taken and used as a new training set to train a component for improving the ranking among these high ranked documents. This process can be repeated on an arbitrary number of successive high ranked subsets. Thus, high ranked items can be reordered in separate stages by focusing on the higher ranked items to facilitate placing the most relevant items at the top of a search results list.

    摘要翻译: 提供了一种有助于提高项目排名的独特系统和方法。 该系统和方法包括在不同阶段重新排列高排名项目的减少子集。 特别地,基本排名组件可以对一组项目进行排序。 可以采用顶级或高级项目的一部分,并将其用作新的培训组,以训练组件以提高这些高排名文档中的排名。 该过程可以在任意数量的连续高排名子集上重复。 因此,通过关注较高排名的项目以便将最相关的项目放置在搜索结果列表的顶部,可以在单独的阶段重新排列高排名的项目。