System and method for distributed privacy preserving data mining
    1.
    发明申请
    System and method for distributed privacy preserving data mining 有权
    分布式隐私保护数据挖掘的系统和方法

    公开(公告)号:US20060015474A1

    公开(公告)日:2006-01-19

    申请号:US10892691

    申请日:2004-07-16

    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.

    Abstract translation: 提供分布式隐私保护数据挖掘技术。 分布式计算环境中的多个实体的第一实体经由隐私保护数据共享协议与多个实体中的第二实体交换摘要信息,使得保留摘要信息的隐私,与 与实体存储的数据相关的实体。 然后,第一实体可以至少基于通过隐私保护数据共享协议从第二实体获得的摘要信息来挖掘数据。 第一实体可以通过隐私保护数据共享协议从第二实体获得与特定项目集出现的交易数量有关的信息和/或与其中满足特定规则的交易数量有关的信息。

    Methods and apparatus for privacy preserving data mining using statistical condensing approach
    3.
    发明申请
    Methods and apparatus for privacy preserving data mining using statistical condensing approach 有权
    使用统计冷凝方法保护数据挖掘隐私的方法和设备

    公开(公告)号:US20050049991A1

    公开(公告)日:2005-03-03

    申请号:US10641935

    申请日:2003-08-14

    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.

    Abstract translation: 提供了用于从与数据挖掘过程相关联使用的至少一个输入数据集生成至少一个输出数据集的方法和装置。 首先,从至少一个输入数据集构建数据统计。 然后,从数据统计生成输出数据集。 输出数据集与输入数据集不同,但保持与输入数据集内的一个或多个相关。 相关性可以是多维输入数据集的不同维度之间的固有相关性。 可以隐藏来自输入数据集的大量信息,从而可以增加数据挖掘过程的隐私级别。

    Method and apparatus for analyzing community evolution in graph data streams
    4.
    发明申请
    Method and apparatus for analyzing community evolution in graph data streams 失效
    用于分析图形数据流中的社区进化的方法和装置

    公开(公告)号:US20070288465A1

    公开(公告)日:2007-12-13

    申请号:US11243727

    申请日:2005-10-05

    CPC classification number: G06Q10/00

    Abstract: Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.

    Abstract translation: 公开了用于检测一组实体之间的交互模式并分析流环境中的社区进化的改进的技术。 作为示例,用于从数据流处理数据的技术包括以下步骤/操作。 获得表示交互事件的数据流的数据点。 基于表示交互事件的数据点,在线更新交互图。 更新的交互图存储在非易失性存储器中。 从存储在非易失性存储器中的更新的交互图中离线确定交互演进。

    Systems and methods for providing real-time classification of continuous data streatms
    5.
    发明申请
    Systems and methods for providing real-time classification of continuous data streatms 有权
    提供连续数据维护的实时分类的系统和方法

    公开(公告)号:US20070043565A1

    公开(公告)日:2007-02-22

    申请号:US11208893

    申请日:2005-08-22

    CPC classification number: G10L15/063 G10L17/00

    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

    Abstract translation: 提供了系统和方法,用于流式传输数据的实时分类。 特别地,用于连续数据流的实时分类的系统和方法实现用于离线和在线处理训练数据的微聚类方法,以构建和动态地更新用于分类的训练模型,以及在连续数据上逐渐聚类数据 将连续数据流的段(实时)分割成多个微群集,从中构建目标简档,其定义/模拟数据流的各个段中的数据的行为。

    Methods and apparatus for dynamic classification of data in evolving data stream
    6.
    发明申请
    Methods and apparatus for dynamic classification of data in evolving data stream 失效
    在进化数据流中数据动态分类的方法和装置

    公开(公告)号:US20060004754A1

    公开(公告)日:2006-01-05

    申请号:US10881036

    申请日:2004-06-30

    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

    Abstract translation: 提供了一种从测试数据流中分类数据的技术。 接收具有类标签的训练数据流。 确定并存储训练数据的一个或多个类特定的簇。 测试数据流的至少一个测试实例使用一个或多个类特定簇进行分类。

    Methods and Apparatus for Data Stream Clustering for Abnormality Monitoring
    7.
    发明申请
    Methods and Apparatus for Data Stream Clustering for Abnormality Monitoring 有权
    数据流聚类异常监测的方法与装置

    公开(公告)号:US20070226212A1

    公开(公告)日:2007-09-27

    申请号:US11753232

    申请日:2007-05-24

    CPC classification number: G06K9/6284 Y10S707/952

    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

    Abstract translation: 提供了用于监视数据流异常的技术。 从数据流接收多个对象,并从这些对象创建一个或多个聚类。 一个或多个集群的至少一部分具有相应集群的统计数据。 从统计数据确定数据流中是否存在一个或多个异常。

    Methods and apparatus for clustering evolving data streams through online and offline components
    8.
    发明申请
    Methods and apparatus for clustering evolving data streams through online and offline components 有权
    通过在线和离线组件对不断发展的数据流进行聚类的方法和装置

    公开(公告)号:US20050038769A1

    公开(公告)日:2005-02-17

    申请号:US10641951

    申请日:2003-08-14

    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.

    Abstract translation: 提供了一种数据流数据聚类技术。 在线统计信息首先从数据流创建。 然后,当离线处理需要或需要时,执行脱机处理在线统计信息。 可以通过从数据流接收数据点以及数据组的形成和更新来创建在线统计。 离线处理可以通过重新聚集采样数据点周围的数据点组并报告新形成的簇来执行。

    Methods and Apparatus for Generating Decision Trees with Discriminants and Employing Same in Data Classification
    9.
    发明申请
    Methods and Apparatus for Generating Decision Trees with Discriminants and Employing Same in Data Classification 有权
    用于生成具有歧视性的决策树并在数据分类中使用相同的方法和装置

    公开(公告)号:US20070288417A1

    公开(公告)日:2007-12-13

    申请号:US11841221

    申请日:2007-08-20

    CPC classification number: G06K9/6282 G06F17/3061 G06F2216/03 Y10S707/99936

    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

    Abstract translation: 提供了用于使用线性判别分析生成决策树并且在分类(也称为分类))中实现这样的决策树的方法和装置。 数据优选地以多维对象的形式,例如包括决策树生成模式中的特征变量和类变量的数据记录,以及仅包括决策树遍历模式中的特征变量的数据记录。 例如,这种创造性的方法创建更有效的监督分类系统。 通常,本发明包括分解决策树,递归地分割,使得实现训练数据的类值之间的最大分离量。 这是通过找到变量的有效组合来实现的,以便递归地分割训练数据并创建决策树。 然后使用决策树对输入测试数据进行分类。

    Method and apparatus for variable privacy preservation in data mining
    10.
    发明申请
    Method and apparatus for variable privacy preservation in data mining 审中-公开
    数据挖掘中可变隐私保护的方法和装置

    公开(公告)号:US20070239982A1

    公开(公告)日:2007-10-11

    申请号:US11249647

    申请日:2005-10-13

    CPC classification number: G06F21/604 G06F21/6245 G06F21/6254

    Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets

    Abstract translation: 公开了根据数据挖掘使用的改进的隐私保护技术。 作为示例,用于保留用于数据挖掘应用的数据记录的隐私的技术包括以下步骤/操作。 不同的隐私级别被分配给数据记录。 基于隐私级别的数据记录构建简化组,其中为每个缩合组维护概要统计。 从摘要统计生成伪数据,其中伪数据可用于数据挖掘应用程序。 本发明的原理能够处理静态和动态数据集

Patent Agency Ranking