Similarity-based searching
    3.
    发明授权
    Similarity-based searching 有权
    基于相似性的搜索

    公开(公告)号:US08032507B1

    公开(公告)日:2011-10-04

    申请号:US12059314

    申请日:2008-03-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3069

    摘要: Pairs of similar objects in a population of objects can be found using a process that includes identifying a comparison vector x in a set of vectors having non-zero features, determining an estimated similarity contribution of a subset of features of the comparison vector x to a similarity between the comparison vector x and each vector in the set of vectors, generating an index that includes features based on a comparison of the similarity contribution with a similarity threshold, and identifying another vector in the set that is similar to the vector x using the index.

    摘要翻译: 可以使用包括识别具有非零特征的一组向量中的比较矢量x的过程来找到对象群体中的对的对,确定比较矢量x的特征子集的估计的相似性贡献为a 在矢量集合中的比较矢量x和每个矢量之间的相似度,基于相似性贡献与相似性阈值的比较来生成包括特征的索引,以及使用相似度阈值识别与矢量x相似的集合中的另一矢量 指数。

    EXPANSION RULE EVALUATION
    4.
    发明申请
    EXPANSION RULE EVALUATION 审中-公开
    扩展规则评估

    公开(公告)号:US20080270364A1

    公开(公告)日:2008-10-30

    申请号:US12107381

    申请日:2008-04-22

    IPC分类号: G06F7/06

    CPC分类号: G06Q30/02 G06F16/951

    摘要: One aspect of the subject matter described in this specification can be embodied in methods that include the actions of monitoring the performance of content items selected in response to an expanded query, identified by a query expansion rule; determining a baseline performance that represents the performance of any presented content item; and determining an expansion rule performance based on the performance of the content items relative to the baseline performance. Other implementations of this aspect include corresponding systems, apparatus, and computer program products.

    摘要翻译: 本说明书中描述的主题的一个方面可以体现在包括监视响应于由查询扩展规则识别的扩展查询所选择的内容项目的性能的动作的方法中; 确定表示任何呈现的内容项目的性能的基准性能; 以及基于相对于基准性能的内容项目的性能来确定扩展规则性能。 该方面的其他实现包括相应的系统,装置和计算机程序产品。

    System and method for order-preserving encryption for numeric data
    5.
    发明申请
    System and method for order-preserving encryption for numeric data 有权
    用于数字数据的订单保存加密的系统和方法

    公开(公告)号:US20050147240A1

    公开(公告)日:2005-07-07

    申请号:US10752154

    申请日:2004-01-05

    IPC分类号: G06F21/00 H04K1/00

    摘要: A system, method, and computer program product to automatically eliminate the distribution information available for reconstruction from a disguised dataset. The invention flattens input numerical values into a substantially uniformly distributed dataset, then maps the uniformly distributed dataset into equivalent data in a target distribution. The invention allows the incremental encryption of new values in an encrypted database while leaving existing encrypted values unchanged. The flattening comprises (1) partitioning, (2) mapping, and (3) saving auxiliary information about the data processing, which is encrypted and not updated. The partitioning is MDL based, and includes a growth phase for dividing a space into fine partitions and a prune phase for merging some partitions together.

    摘要翻译: 一种系统,方法和计算机程序产品,用于自动从伪装的数据集中消除可用于重建的分发信息。 本发明将输入数值平坦化为基本上均匀分布的数据集,然后将均匀分布的数据集映射到目标分布中的等效数据。 本发明允许对加密数据库中的新值进行增量加密,同时保留现有加密值。 扁平化包括(1)划分,(2)映射和(3)保存关于被加密且未更新的数据处理的辅助信息。 分区是基于MDL,并且包括用于将空间分割成精细分区的生长阶段和用于将一些分区合并在一起的剪枝阶段。

    Method and system for building a decision-tree classifier from privacy-preserving data
    6.
    发明授权
    Method and system for building a decision-tree classifier from privacy-preserving data 有权
    从隐私保护数据构建决策树分类器的方法和系统

    公开(公告)号:US06546389B1

    公开(公告)日:2003-04-08

    申请号:US09487643

    申请日:2000-01-19

    IPC分类号: G06F1730

    摘要: A system and method for mining data while preserving a user's privacy includes perturbing user-related information at the user's computer and sending the perturbed data to a Web site. At the Web site, perturbed data from many users is aggregated, and from the distribution of the perturbed data, the distribution of the original data is reconstructed, although individual records cannot be reconstructed. Based on the reconstructed distribution, a decision tree classification model or a Naive Bayes classification model is developed, with the model then being provided back to the users, who can use the model on their individual data to generate classifications that are then sent back to the Web site such that the Web site can display a page appropriately configured for the user's classification. Or, the classification model need not be provided to users, but the Web site can use the model to, e.g., send search results and a ranking model to a user, with the ranking model being used at the user computer to rank the search results based on the user's individual classification data.

    摘要翻译: 用于在保护用户隐私的同时挖掘数据的系统和方法包括扰乱用户计算机上的用户相关信息并将干扰的数据发送到网站。 在网站上,来自许多用户的扰动数据被聚合,并且从干扰数据的分布中,重构原始数据的分布,尽管不能重建个体记录。 基于重构分布,开发了决策树分类模型或朴素贝叶斯分类模型,然后将模型提供给用户,他们可以使用模型对其个人数据生成分类,然后将其发送回 网站,使得网站可以显示适当地为用户分类配置的页面。 或者,不需要向用户提供分类模型,但是网站可以使用该模型来例如向用户发送搜索结果和排名模型,在用户计算机上使用排名模型对搜索结果进行排名 基于用户的个人分类数据。

    Method and system for mining quantitative association rules in large
relational tables
    7.
    发明授权
    Method and system for mining quantitative association rules in large relational tables 失效
    在大型关系表中挖掘定量关联规则的方法和系统

    公开(公告)号:US5724573A

    公开(公告)日:1998-03-03

    申请号:US577945

    申请日:1995-12-22

    IPC分类号: G06F17/30

    摘要: A method and apparatus are disclosed for mining quantitative association rules from a relational table of records. The method comprises the steps of: partitioning the values of selected quantitative attributes into intervals, combining adjacent attribute values and intervals into ranges, generating candidate itemsets, determining frequent itemsets, and outputting an association rule when the support for a frequent itemset bears a predetermined relationship to the support for a subset of the frequent itemset. Preferably, the partitioning step includes determining whether to partition and the number of partitions based on a partial incompleteness measure. The candidate generation includes discarding those itemsets not meeting a user-specified interest level and those having a subset which is not a frequent itemset. The frequent itemsets are determined using super-candidates that include information of the candidate itemsets. Preferably, each super-candidate has a data structure, such as a multi-dimensional tree or array, representing quantitative attributes common to the replaced candidate itemsets.

    摘要翻译: 公开了一种从记录的关系表中挖掘定量关联规则的方法和装置。 该方法包括以下步骤:将所选择的定量属性的值分为间隔,将相邻属性值和间隔组合成范围,生成候选项集,确定频繁项集,以及当对频繁项集的支持具有预定关系时输出关联规则 支持频繁项目集的一个子集。 优选地,分割步骤包括基于部分不完全性测量确定是否划分分区和分割数。 候选生成包括丢弃不符合用户指定的兴趣级别的那些项目集,以及具有不是频繁项目集的子集的项目集。 使用包括候选项集的信息的超级候选来确定频繁项集。 优选地,每个超级候选具有诸如多维树或阵列的数据结构,其表示替换的候选项集合共同的定量属性。

    Similarity-based searching
    8.
    发明授权
    Similarity-based searching 有权
    基于相似性的搜索

    公开(公告)号:US08190592B1

    公开(公告)日:2012-05-29

    申请号:US13213768

    申请日:2011-08-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3069

    摘要: Pairs of similar objects in a population of objects can be found using a process that includes identifying a comparison vector x in a set of vectors having non-zero features, determining an estimated similarity contribution of a subset of features of the comparison vector x to a similarity between the comparison vector x and each vector in the set of vectors, generating an index that includes features based on a comparison of the similarity contribution with a similarity threshold, and identifying another vector in the set that is similar to the vector x using the index.

    摘要翻译: 可以使用包括识别具有非零特征的一组向量中的比较矢量x的过程来找到对象群体中的对的对,确定比较矢量x的特征子集的估计的相似性贡献为a 在矢量集合中的比较矢量x和每个矢量之间的相似度,基于相似性贡献与相似性阈值的比较来生成包括特征的索引,以及使用相似度阈值识别与矢量x相似的集合中的另一矢量 指数。

    Similarity-based searching
    9.
    发明授权
    Similarity-based searching 有权
    基于相似性的搜索

    公开(公告)号:US08041694B1

    公开(公告)日:2011-10-18

    申请号:US12059302

    申请日:2008-03-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3069

    摘要: Pairs of similar vectors in a set of vectors are identified. A comparison vector x is identified, and a set of candidate vectors corresponding to the vector x is identified. For each candidate vector y in the set, a similarity estimate between the comparison vector x and the candidate vector y is determined; if the similarity estimate meets a similarity threshold, a similarity score of the comparison vector x and the candidate vector y is determined; and if the similarity score meets the similarity threshold, the pair of vectors (x, y) is included in a list of similar pairs of vectors.

    摘要翻译: 确定一组载体中相似载体的对。 识别比较矢量x,并且识别与矢量x相对应的一组候选矢量。 对于集合中的每个候选向量y,确定比较向量x和候选向量y之间的相似性估计; 如果相似性估计满足相似性阈值,则确定比较矢量x和候选矢量y的相似度得分; 并且如果相似性得分满足相似性阈值,则该对向量(x,y)被包括在类似的向量对的列表中。

    System and method for order-preserving encryption for numeric data
    10.
    发明授权
    System and method for order-preserving encryption for numeric data 有权
    用于数字数据的订单保存加密的系统和方法

    公开(公告)号:US07426752B2

    公开(公告)日:2008-09-16

    申请号:US10752154

    申请日:2004-01-05

    IPC分类号: G06F17/30

    摘要: A system, method, and computer program product to automatically eliminate the distribution information available for reconstruction from a disguised dataset. The invention flattens input numerical values into a substantially uniformly distributed dataset, then maps the uniformly distributed dataset into equivalent data in a target distribution. The invention allows the incremental encryption of new values in an encrypted database while leaving existing encrypted values unchanged. The flattening comprises (1) partitioning, (2) mapping, and (3) saving auxiliary information about the data processing, which is encrypted and not updated. The partitioning is MDL based, and includes a growth phase for dividing a space into fine partitions and a prune phase for merging some partitions together.

    摘要翻译: 一种系统,方法和计算机程序产品,用于自动从伪装的数据集中消除可用于重建的分发信息。 本发明将输入数值平坦化为基本上均匀分布的数据集,然后将均匀分布的数据集映射到目标分布中的等效数据。 本发明允许对加密数据库中的新值进行增量加密,同时保留现有加密值。 扁平化包括(1)划分,(2)映射和(3)保存关于被加密且未更新的数据处理的辅助信息。 分区是基于MDL,并且包括用于将空间分割成精细分区的生长阶段和用于将一些分区合并在一起的剪枝阶段。