Methods and apparatus for data stream clustering for abnormality monitoring
    1.
    发明授权
    Methods and apparatus for data stream clustering for abnormality monitoring 有权
    用于异常监测的数据流聚类的方法和装置

    公开(公告)号:US07970772B2

    公开(公告)日:2011-06-28

    申请号:US11753232

    申请日:2007-05-24

    IPC分类号: G06F7/00 G06F17/30 G06F15/16

    CPC分类号: G06K9/6284 Y10S707/952

    摘要: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

    摘要翻译: 提供了用于监视数据流异常的技术。 从数据流接收多个对象,并从这些对象创建一个或多个聚类。 一个或多个集群的至少一部分具有相应集群的统计数据。 从统计数据确定数据流中是否存在一个或多个异常。

    Method and apparatus for query processing of uncertain data
    2.
    发明授权
    Method and apparatus for query processing of uncertain data 有权
    不确定性数据查询处理方法与装置

    公开(公告)号:US07917517B2

    公开(公告)日:2011-03-29

    申请号:US12039091

    申请日:2008-02-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30657

    摘要: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.

    摘要翻译: 公开了用于在查询处理系统中索引不确定数据的技术。 例如,在涉及不确定数据集的应用程序中处理查询的方法包括以下步骤。 基于平均值和不确定性值创建不确定数据集的记录表示。 该表示用于处理在不确定数据集上接收到的查询。

    Method and Apparatus for Variable Privacy Preservation in Data Mining
    3.
    发明申请
    Method and Apparatus for Variable Privacy Preservation in Data Mining 失效
    数据挖掘中可变隐藏保护的方法和装置

    公开(公告)号:US20090319526A1

    公开(公告)日:2009-12-24

    申请号:US12119766

    申请日:2008-05-13

    IPC分类号: G06F17/30

    摘要: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets

    摘要翻译: 公开了根据数据挖掘使用的改进的隐私保护技术。 作为示例,用于保留用于数据挖掘应用的数据记录的隐私的技术包括以下步骤/操作。 不同的隐私级别被分配给数据记录。 基于隐私级别的数据记录构建简化组,其中为每个缩合组维护概要统计。 从总结统计生成伪数据,其中伪数据可用于数据挖掘应用程序。 本发明的原理能够处理静态和动态数据集

    Method and Apparatus for Aggregation in Uncertain Data
    4.
    发明申请
    Method and Apparatus for Aggregation in Uncertain Data 有权
    不确定数据聚合的方法和装置

    公开(公告)号:US20090222472A1

    公开(公告)日:2009-09-03

    申请号:US12039076

    申请日:2008-02-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30489

    摘要: Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.

    摘要翻译: 公开了用于在数据处理系统中的不确定数据中聚合的技术。 例如,涉及不确定数据集的应用程序中的聚合方法包括以下步骤。 获得不确定性数据集以及不确定性信息。 从数据集构建一个或多个数据点簇。 存储一个或多个聚类和不确定性信息的聚合统计信息。 数据集可以是来自数据流的数据。 实现在诸如数据挖掘过程的应用中使用甚至适度的不确定性信息足以大大提高底层结果的质量。

    Method and Apparatus for Query Processing of Uncertain Data
    5.
    发明申请
    Method and Apparatus for Query Processing of Uncertain Data 有权
    不确定数据查询处理方法与装置

    公开(公告)号:US20090222410A1

    公开(公告)日:2009-09-03

    申请号:US12039091

    申请日:2008-02-28

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30657

    摘要: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.

    摘要翻译: 公开了用于在查询处理系统中索引不确定数据的技术。 例如,在涉及不确定数据集的应用程序中处理查询的方法包括以下步骤。 基于平均值和不确定性值创建不确定数据集的记录表示。 该表示用于处理在不确定数据集上接收到的查询。

    Methods and apparatus for clustering evolving data streams through online and offline components
    6.
    发明授权
    Methods and apparatus for clustering evolving data streams through online and offline components 有权
    通过在线和离线组件对不断发展的数据流进行聚类的方法和装置

    公开(公告)号:US07353218B2

    公开(公告)日:2008-04-01

    申请号:US10641951

    申请日:2003-08-14

    IPC分类号: G06F7/00 G06F17/30 G06F17/00

    摘要: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.

    摘要翻译: 提供了一种数据流数据聚类技术。 在线统计信息首先从数据流创建。 然后,当离线处理需要或需要时,执行脱机处理在线统计信息。 可以通过从数据流接收数据点以及数据组的形成和更新来创建在线统计。 离线处理可以通过重新聚集采样数据点周围的数据点组并报告新形成的簇来执行。

    Methods and apparatus for performing dimensionality reduction in a supervised application domain
    7.
    发明授权
    Methods and apparatus for performing dimensionality reduction in a supervised application domain 失效
    用于在受监督的应用领域中进行降维的方法和装置

    公开(公告)号:US06505207B1

    公开(公告)日:2003-01-07

    申请号:US09557626

    申请日:2000-04-25

    IPC分类号: G06F1730

    CPC分类号: G06K9/6234 Y10S707/99942

    摘要: A technique of the system for generating a reduced representation of input data, wherein the input data has a first set of feature variables and a class variable associated therewith, comprises the following steps. A second set of feature variables is determined from the first set of feature variables, wherein the second set of feature variables corresponds to mutually orthogonal vectors. Then, one or more of the feature variables associated with the second set of feature variables are selected based on a level of discrimination with respect to the class variable. The input data is then represented using the one or more selected feature variables.

    摘要翻译: 一种用于生成输入数据的简化表示的系统的技术,其中所述输入数据具有第一组特征变量和与其相关联的类变量,包括以下步骤。 从第一组特征变量确定第二组特征变量,其中第二组特征变量对应于相互正交的向量。 然后,基于相对于类变量的鉴别级别来选择与第二组特征变量相关联的一个或多个特征变量。 然后使用一个或多个所选择的特征变量来表示输入数据。

    SYSTEM AND METHOD FOR RESOURCE ADAPTIVE CLASSIFICATION OF DATA STREAMS
    8.
    发明申请
    SYSTEM AND METHOD FOR RESOURCE ADAPTIVE CLASSIFICATION OF DATA STREAMS 有权
    资源自适应分类数据流系统与方法

    公开(公告)号:US20110213740A1

    公开(公告)日:2011-09-01

    申请号:US13078419

    申请日:2011-04-01

    IPC分类号: G06F15/18

    CPC分类号: G06K9/6282 G06N99/005

    摘要: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.

    摘要翻译: 一种用于数据流资源自适应分类的系统和方法。 系统和方法的实施例提供在计算机中接收的分类数据,包括离散接收的数据,从所接收的数据构建中间数据结构作为训练实例,对所接收的数据进行子空间采样作为测试实例,并基于 所述子空间抽样统计。

    Methods and apparatus for outlier detection for high dimensional data sets
    9.
    发明授权
    Methods and apparatus for outlier detection for high dimensional data sets 有权
    用于高维数据集异常检测的方法和装置

    公开(公告)号:US07865456B2

    公开(公告)日:2011-01-04

    申请号:US12134371

    申请日:2008-06-06

    IPC分类号: G06N5/00

    CPC分类号: G06K9/6284

    摘要: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

    摘要翻译: 通过确定稀疏的低维投影,为数据库中的异常值检测提供了方法和装置。 这些稀疏投影用于确定哪些点是异常值。 本发明的方法在提供用于数据的高维域的异常或异常值的新颖定义方面非常重要。

    Methods for dynamic classification of data in evolving data stream
    10.
    发明授权
    Methods for dynamic classification of data in evolving data stream 失效
    在进化数据流中数据的动态分类方法

    公开(公告)号:US07379939B2

    公开(公告)日:2008-05-27

    申请号:US10881036

    申请日:2004-06-30

    IPC分类号: G06F7/00

    摘要: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

    摘要翻译: 提供了一种从测试数据流中分类数据的技术。 接收具有类标签的训练数据流。 确定并存储训练数据的一个或多个类特定的簇。 测试数据流的至少一个测试实例使用一个或多个类特定簇进行分类。