Method, apparatus and computer program product for preserving privacy in data mining
    1.
    发明授权
    Method, apparatus and computer program product for preserving privacy in data mining 有权
    用于保护数据挖掘隐私的方法,设备和计算机程序产品

    公开(公告)号:US07904471B2

    公开(公告)日:2011-03-08

    申请号:US11836171

    申请日:2007-08-09

    Abstract: Privacy in data mining of sparse high dimensional data records is preserved by transforming the data records into anonymized data records. This transformation involves creating a sketch-based private representation of each data record, each data record containing only a small number of non-zero attribute value in relation to the high dimensionality of the data records.

    Abstract translation: 通过将数据记录转换为匿名数据记录来保留稀疏高维数据记录的数据挖掘隐私。 该变换涉及创建每个数据记录的基于草图的私有表示,每个数据记录仅包含相对于数据记录的高维数的少量非零属性值。

    Methods and Apparatus for Generating Decision Trees with Discriminants and Employing Same in Data Classification
    2.
    发明申请
    Methods and Apparatus for Generating Decision Trees with Discriminants and Employing Same in Data Classification 有权
    用于生成具有歧视性的决策树并在数据分类中使用相同的方法和装置

    公开(公告)号:US20070288417A1

    公开(公告)日:2007-12-13

    申请号:US11841221

    申请日:2007-08-20

    CPC classification number: G06K9/6282 G06F17/3061 G06F2216/03 Y10S707/99936

    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

    Abstract translation: 提供了用于使用线性判别分析生成决策树并且在分类(也称为分类))中实现这样的决策树的方法和装置。 数据优选地以多维对象的形式,例如包括决策树生成模式中的特征变量和类变量的数据记录,以及仅包括决策树遍历模式中的特征变量的数据记录。 例如,这种创造性的方法创建更有效的监督分类系统。 通常,本发明包括分解决策树,递归地分割,使得实现训练数据的类值之间的最大分离量。 这是通过找到变量的有效组合来实现的,以便递归地分割训练数据并创建决策树。 然后使用决策树对输入测试数据进行分类。

    Method and apparatus for variable privacy preservation in data mining
    3.
    发明申请
    Method and apparatus for variable privacy preservation in data mining 审中-公开
    数据挖掘中可变隐私保护的方法和装置

    公开(公告)号:US20070239982A1

    公开(公告)日:2007-10-11

    申请号:US11249647

    申请日:2005-10-13

    CPC classification number: G06F21/604 G06F21/6245 G06F21/6254

    Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets

    Abstract translation: 公开了根据数据挖掘使用的改进的隐私保护技术。 作为示例,用于保留用于数据挖掘应用的数据记录的隐私的技术包括以下步骤/操作。 不同的隐私级别被分配给数据记录。 基于隐私级别的数据记录构建简化组,其中为每个缩合组维护概要统计。 从摘要统计生成伪数据,其中伪数据可用于数据挖掘应用程序。 本发明的原理能够处理静态和动态数据集

    Methods and Apparatus for Dynamic Classification of Data in Evolving Data Stream
    4.
    发明申请
    Methods and Apparatus for Dynamic Classification of Data in Evolving Data Stream 失效
    数据流动态分类方法与装置

    公开(公告)号:US20070226216A1

    公开(公告)日:2007-09-27

    申请号:US11756227

    申请日:2007-05-31

    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

    Abstract translation: 提供了一种从测试数据流中分类数据的技术。 接收具有类标签的训练数据流。 确定并存储训练数据的一个或多个类特定的簇。 测试数据流的至少一个测试实例使用一个或多个类特定簇进行分类。

    Methods and Apparatus for Clustering Evolving Data Streams Through Online and Offline Components
    5.
    发明申请
    Methods and Apparatus for Clustering Evolving Data Streams Through Online and Offline Components 审中-公开
    通过在线和离线组件聚合演进数据流的方法和设备

    公开(公告)号:US20070226209A1

    公开(公告)日:2007-09-27

    申请号:US11755473

    申请日:2007-05-30

    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.

    Abstract translation: 提供了一种数据流数据聚类技术。 在线统计信息首先从数据流创建。 然后,当离线处理需要或需要时,执行脱机处理在线统计信息。 可以通过从数据流接收数据点以及数据组的形成和更新来创建在线统计。 离线处理可以通过重新聚集采样数据点周围的数据点组并报告新形成的簇来执行。

    Method and apparatus for processing data streams
    6.
    发明申请
    Method and apparatus for processing data streams 失效
    用于处理数据流的方法和装置

    公开(公告)号:US20060282425A1

    公开(公告)日:2006-12-14

    申请号:US11110079

    申请日:2005-04-20

    CPC classification number: G06F17/30592 G06F17/30516 G06F17/30539 G06K9/6221

    Abstract: Techniques are disclosed for clustering and classifying stream data. By way of example, a technique for processing a data stream comprises the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure may be used for classification of data in the data stream.

    Abstract translation: 公开了用于聚类和分类流数据的技术。 作为示例,用于处理数据流的技术包括以下步骤/操作。 保持表示数据流中的一个或多个簇的簇结构。 使用集群结构中的数据点为一个或多个集群中的每一个确定一组投影尺寸。 使用与每个一个或多个聚类的每一组的每个投影维度相关联的距离来确定数据流的输入数据点到一个或多个聚类的分配。 此外,簇结构可以用于数据流中的数据分类。

    Systems and methods of data traffic generation via density estimation
    7.
    发明申请
    Systems and methods of data traffic generation via density estimation 失效
    通过密度估计生成数据流量的系统和方法

    公开(公告)号:US20060242610A1

    公开(公告)日:2006-10-26

    申请号:US11092495

    申请日:2005-03-29

    Applicant: Charu Aggarwal

    Inventor: Charu Aggarwal

    CPC classification number: G06F17/30705

    Abstract: Systems and methods for providing density-based traffic generation. Data are clustered to create partitions, and transforms of clustered data are constructed in a transformed space. Data points are generated via employing grid discretization in the transformed space, and density estimates of the generated data points are employed to generate synthetic pseudo-points.

    Abstract translation: 提供基于密度的流量生成的系统和方法。 数据被聚集以创建分区,并且在变换的空间中构建聚类数据的变换。 通过在变换空间中采用网格离散化来生成数据点,并且采用生成的数据点的密度估计来生成合成伪点。

    Methods and apparatus for data stream clustering for abnormality monitoring
    8.
    发明申请
    Methods and apparatus for data stream clustering for abnormality monitoring 审中-公开
    用于异常监测的数据流聚类的方法和装置

    公开(公告)号:US20050210027A1

    公开(公告)日:2005-09-22

    申请号:US10801420

    申请日:2004-03-16

    CPC classification number: G06K9/6284 Y10S707/952

    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

    Abstract translation: 提供了用于监视数据流异常的技术。 从数据流接收多个对象,并从这些对象创建一个或多个聚类。 一个或多个集群的至少一部分具有相应集群的统计数据。 从统计数据确定数据流中是否存在一个或多个异常。

    EVENT MINING IN SOCIAL NETWORKS
    9.
    发明申请
    EVENT MINING IN SOCIAL NETWORKS 有权
    社会网络中的活动采矿

    公开(公告)号:US20130151522A1

    公开(公告)日:2013-06-13

    申请号:US13324513

    申请日:2011-12-13

    CPC classification number: G06F17/30516 G06F17/3071 H04L51/32

    Abstract: A method and system for detecting an event from a social stream. The method includes the steps of: receiving a social stream from a social network, where the social stream includes at least one object and the object includes a text, sender information of the text, and recipient information of the text; assigning said object to a cluster based on a similarity value between the object and the clusters; monitoring changes in at least one of the clusters; and triggering an alarm when the changes in at least one of the clusters exceed a first threshold value, where at least one of the steps is carried out using a computer device.

    Abstract translation: 一种用于从社交流中检测事件的方法和系统。 该方法包括以下步骤:从社交网络接收社交流,其中社交流包括至少一个对象,并且对象包括文本,文本的发送者信息和文本的接收者信息; 基于对象和群集之间的相似度值将所述对象分配给群集; 监视至少一个集群的变化; 并且当至少一个所述簇中的变化超过第一阈值时触发报警,其中使用计算机设备执行至少一个所述步骤。

    System and method for resource adaptive classification of data streams
    10.
    发明授权
    System and method for resource adaptive classification of data streams 失效
    数据流资源自适应分类的系统和方法

    公开(公告)号:US08051021B2

    公开(公告)日:2011-11-01

    申请号:US11530938

    申请日:2006-09-12

    CPC classification number: G06K9/6282 G06N99/005

    Abstract: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.

    Abstract translation: 一种用于数据流资源自适应分类的系统和方法。 系统和方法的实施例提供在计算机中接收的分类数据,包括离散接收的数据,从所接收的数据构建中间数据结构作为训练实例,对所接收的数据进行子空间采样作为测试实例,并基于 所述子空间抽样统计。

Patent Agency Ranking