System and method for distributed privacy preserving data mining
    1.
    发明授权
    System and method for distributed privacy preserving data mining 失效
    分布式隐私保护数据挖掘的系统和方法

    公开(公告)号:US08650213B2

    公开(公告)日:2014-02-11

    申请号:US11752708

    申请日:2007-05-23

    IPC分类号: G06F7/00

    摘要: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.

    摘要翻译: 提供分布式隐私保护数据挖掘技术。 分布式计算环境中的多个实体的第一实体通过隐私保护数据共享协议与多个实体的第二实体交换摘要信息,使得保留摘要信息的隐私,与 与实体存储的数据相关的实体。 然后,第一实体可以至少基于通过隐私保护数据共享协议从第二实体获得的摘要信息来挖掘数据。 第一实体可以通过隐私保护数据共享协议从第二实体获得与特定项目集出现的交易数量有关的信息和/或与其中满足特定规则的交易数量有关的信息。

    Method and apparatus for analyzing community evolution in graph data streams
    2.
    发明授权
    Method and apparatus for analyzing community evolution in graph data streams 失效
    用于分析图形数据流中的社区进化的方法和装置

    公开(公告)号:US07890510B2

    公开(公告)日:2011-02-15

    申请号:US11243727

    申请日:2005-10-05

    IPC分类号: G06F7/00 G06N5/00

    CPC分类号: G06Q10/00

    摘要: Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.

    摘要翻译: 公开了用于检测一组实体之间的交互模式并分析流环境中的社区进化的改进的技术。 作为示例,用于从数据流处理数据的技术包括以下步骤/操作。 获得表示交互事件的数据流的数据点。 基于表示交互事件的数据点,在线更新交互图。 更新的交互图存储在非易失性存储器中。 从存储在非易失性存储器中的更新的交互图中离线确定交互演进。

    Method and apparatus for processing data streams
    3.
    发明授权
    Method and apparatus for processing data streams 失效
    用于处理数据流的方法和装置

    公开(公告)号:US07739284B2

    公开(公告)日:2010-06-15

    申请号:US11110079

    申请日:2005-04-20

    IPC分类号: G06F7/00

    摘要: A technique for processing a data stream includes the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure maybe used for classification of data in the data stream.

    摘要翻译: 一种用于处理数据流的技术包括以下步骤/操作。 保持表示数据流中的一个或多个簇的簇结构。 使用集群结构中的数据点为一个或多个集群中的每一个确定一组投影尺寸。 使用与每个一个或多个聚类的每一组的每个投影维度相关联的距离来确定数据流的输入数据点到一个或多个聚类的分配。 此外,集群结构可以用于数据流中的数据分类。

    Methods and apparatus for generating decision trees with discriminants and employing same in data classification
    4.
    发明授权
    Methods and apparatus for generating decision trees with discriminants and employing same in data classification 有权
    用于生成具有歧视性的决策树并在数据分类中采用相同的方法和装置

    公开(公告)号:US07716154B2

    公开(公告)日:2010-05-11

    申请号:US11841221

    申请日:2007-08-20

    IPC分类号: G06N5/00

    摘要: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

    摘要翻译: 提供了用于使用线性判别分析生成决策树并且在分类(也称为分类))中实现这样的决策树的方法和装置。 数据优选地以多维对象的形式,例如包括决策树生成模式中的特征变量和类变量的数据记录,以及仅包括决策树遍历模式中的特征变量的数据记录。 例如,这种创造性的方法创建更有效的监督分类系统。 通常,本发明包括分解决策树,递归地分割,使得实现训练数据的类值之间的最大分离量。 这是通过找到变量的有效组合来实现的,以便递归地分割训练数据并创建决策树。 然后使用决策树对输入测试数据进行分类。

    Apparatus for dynamic classification of data in evolving data stream
    5.
    发明授权
    Apparatus for dynamic classification of data in evolving data stream 失效
    用于在演进数据流中数据的动态分类的装置

    公开(公告)号:US07487167B2

    公开(公告)日:2009-02-03

    申请号:US11756227

    申请日:2007-05-31

    IPC分类号: G06F7/00

    摘要: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

    摘要翻译: 提供了一种从测试数据流中分类数据的技术。 接收具有类标签的训练数据流。 确定并存储训练数据的一个或多个类特定的簇。 测试数据流的至少一个测试实例使用一个或多个类特定簇进行分类。

    Methods and Apparatus for Outlier Detection for High Dimensional Data Sets
    6.
    发明申请
    Methods and Apparatus for Outlier Detection for High Dimensional Data Sets 有权
    用于高维数据集异常检测的方法和装置

    公开(公告)号:US20080234977A1

    公开(公告)日:2008-09-25

    申请号:US12134371

    申请日:2008-06-06

    IPC分类号: G06F17/18

    CPC分类号: G06K9/6284

    摘要: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

    摘要翻译: 通过确定稀疏的低维投影,为数据库中的异常值检测提供了方法和装置。 这些稀疏投影用于确定哪些点是异常值。 本发明的方法在提供用于数据的高维域的异常或异常值的新颖定义方面非常重要。

    Methods and apparatus for privacy preserving data mining using statistical condensing approach
    7.
    发明授权
    Methods and apparatus for privacy preserving data mining using statistical condensing approach 有权
    使用统计冷凝方法保护数据挖掘隐私的方法和设备

    公开(公告)号:US07302420B2

    公开(公告)日:2007-11-27

    申请号:US10641935

    申请日:2003-08-14

    IPC分类号: G06F17/30

    摘要: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.

    摘要翻译: 提供了用于从与数据挖掘过程相关联使用的至少一个输入数据集生成至少一个输出数据集的方法和装置。 首先,从至少一个输入数据集构建数据统计。 然后,从数据统计生成输出数据集。 输出数据集与输入数据集不同,但保持与输入数据集内的一个或多个相关。 相关性可以是多维输入数据集的不同维度之间的固有相关性。 可以隐藏来自输入数据集的大量信息,从而可以增加数据挖掘过程的隐私级别。

    Methods and apparatus for similarity text search based on conceptual indexing
    8.
    发明授权
    Methods and apparatus for similarity text search based on conceptual indexing 有权
    基于概念索引的相似文本搜索的方法和装置

    公开(公告)号:US06542889B1

    公开(公告)日:2003-04-01

    申请号:US09493811

    申请日:2000-01-28

    IPC分类号: G06F1730

    摘要: In one aspect of the invention, a method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.

    摘要翻译: 在本发明的一个方面,一种执行概念相似性搜索的方法包括以下步骤:从在概念相似搜索中使用的一个或多个文档生成一个或多个概念性的单词链; 用一个或多个单词链构建文档的概念索引; 并使用概念索引评估相似性查询。 评估步骤优选地返回由搜索产生的一个或多个最接近的文档; 一个或多个文档中的一个或多个匹配的单词链; 以及一个或多个文档的一个或多个匹配的主题词。

    Methods and apparatus for data stream clustering for abnormality monitoring
    9.
    发明授权
    Methods and apparatus for data stream clustering for abnormality monitoring 有权
    用于异常监测的数据流聚类的方法和装置

    公开(公告)号:US07970772B2

    公开(公告)日:2011-06-28

    申请号:US11753232

    申请日:2007-05-24

    IPC分类号: G06F7/00 G06F17/30 G06F15/16

    CPC分类号: G06K9/6284 Y10S707/952

    摘要: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

    摘要翻译: 提供了用于监视数据流异常的技术。 从数据流接收多个对象,并从这些对象创建一个或多个聚类。 一个或多个集群的至少一部分具有相应集群的统计数据。 从统计数据确定数据流中是否存在一个或多个异常。

    Method and apparatus for query processing of uncertain data
    10.
    发明授权
    Method and apparatus for query processing of uncertain data 有权
    不确定性数据查询处理方法与装置

    公开(公告)号:US07917517B2

    公开(公告)日:2011-03-29

    申请号:US12039091

    申请日:2008-02-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30657

    摘要: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.

    摘要翻译: 公开了用于在查询处理系统中索引不确定数据的技术。 例如,在涉及不确定数据集的应用程序中处理查询的方法包括以下步骤。 基于平均值和不确定性值创建不确定数据集的记录表示。 该表示用于处理在不确定数据集上接收到的查询。