Systems and methods for providing real-time classification of continuous data streatms
    21.
    发明申请
    Systems and methods for providing real-time classification of continuous data streatms 有权
    提供连续数据维护的实时分类的系统和方法

    公开(公告)号:US20070043565A1

    公开(公告)日:2007-02-22

    申请号:US11208893

    申请日:2005-08-22

    CPC classification number: G10L15/063 G10L17/00

    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

    Abstract translation: 提供了系统和方法,用于流式传输数据的实时分类。 特别地,用于连续数据流的实时分类的系统和方法实现用于离线和在线处理训练数据的微聚类方法,以构建和动态地更新用于分类的训练模型,以及在连续数据上逐渐聚类数据 将连续数据流的段(实时)分割成多个微群集,从中构建目标简档,其定义/模拟数据流的各个段中的数据的行为。

    Methods and apparatus for dynamic classification of data in evolving data stream
    22.
    发明申请
    Methods and apparatus for dynamic classification of data in evolving data stream 失效
    在进化数据流中数据动态分类的方法和装置

    公开(公告)号:US20060004754A1

    公开(公告)日:2006-01-05

    申请号:US10881036

    申请日:2004-06-30

    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

    Abstract translation: 提供了一种从测试数据流中分类数据的技术。 接收具有类标签的训练数据流。 确定并存储训练数据的一个或多个类特定的簇。 测试数据流的至少一个测试实例使用一个或多个类特定簇进行分类。

    Method and apparatus for privacy preserving data mining by restricting attribute choice
    25.
    发明申请
    Method and apparatus for privacy preserving data mining by restricting attribute choice 有权
    通过限制属性选择来保护数据挖掘隐私的方法和装置

    公开(公告)号:US20070233711A1

    公开(公告)日:2007-10-04

    申请号:US11397297

    申请日:2006-04-04

    Abstract: Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.

    Abstract translation: 公开了用于多维数据记录的隐私保护数据挖掘的改进技术。 例如,用于从至少一个输入数据集生成至少一个输出数据集用于与数据挖掘过程相关联使用的技术包括以下步骤/操作。 通过确定至少一个相关性系数来选择至少一个输入数据集的至少一个相关属性。 所述至少一个输出数据集是从所述至少一个输入数据集生成的,其中所述至少一个输出数据组包括所述至少一个输入数据集的至少一个相关属性,如通过至少一个 一个相关系数。

    System and method of flexible data reduction for arbitrary applications
    26.
    发明申请
    System and method of flexible data reduction for arbitrary applications 失效
    用于任意应用的灵活数据简化的系统和方法

    公开(公告)号:US20060026175A1

    公开(公告)日:2006-02-02

    申请号:US10901278

    申请日:2004-07-28

    Applicant: Charu Aggarwal

    Inventor: Charu Aggarwal

    Abstract: The present invention is directed to the use of an evolutionary algorithm to locate optimal solution subspaces. The evolutionary algorithm uses a point-based coding of the subspace determination problem and searches selectively over the space of possible coded solutions. Each feasible solution to the problem, or individual in the population of feasible solutions, is coded as a string, which facilitates use of the evolutionary algorithm to determine the optimal solution to the fitness function. The fitness of each string is determined by solving the objective function for that string. The resulting fitness value can then be converted to a rank, and all of the members of the population of solutions can be evaluated using selection, crossover, and mutation processes that are applied sequentially and iteratively to the individuals in the population of solutions. The population of solutions is updated as the individuals in the population evolve and converge, that is become increasingly genetically similar to one another. The iterations of selection, crossover and mutation are performed until a desired level of convergence among the individuals in the population of solutions has been achieved.

    Abstract translation: 本发明涉及使用进化算法来定位最优解子空间。 进化算法使用子空间确定问题的基于点的编码,并在可能的编码解决方案的空间上有选择地搜索。 问题的每个可行解决方案或可行解决方案中的个体都被编码为字符串,这有助于使用进化算法来确定适合度函数的最优解。 每个字符串的适合度是通过求解该字符串的目标函数来确定的。 然后可以将得到的适合度值转换成等级,并且可以使用对于解决方案群体中的个体顺序和迭代地应用的选择,交叉和突变过程来评估解决方案群体的所有成员。 解决方案的人口随着人口中的个体发展和趋同而得到更新,这种变化越来越多地基因上彼此相似。 执行选择,交叉和突变的迭代,直到解决方案群体中的个体之间达到期望的收敛水平。

    Methods and apparatus for privacy preserving data mining using statistical condensing approach
    28.
    发明申请
    Methods and apparatus for privacy preserving data mining using statistical condensing approach 有权
    使用统计冷凝方法保护数据挖掘隐私的方法和设备

    公开(公告)号:US20050049991A1

    公开(公告)日:2005-03-03

    申请号:US10641935

    申请日:2003-08-14

    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.

    Abstract translation: 提供了用于从与数据挖掘过程相关联使用的至少一个输入数据集生成至少一个输出数据集的方法和装置。 首先,从至少一个输入数据集构建数据统计。 然后,从数据统计生成输出数据集。 输出数据集与输入数据集不同,但保持与输入数据集内的一个或多个相关。 相关性可以是多维输入数据集的不同维度之间的固有相关性。 可以隐藏来自输入数据集的大量信息,从而可以增加数据挖掘过程的隐私级别。

    METHOD FOR CLASSIFICATION OF OBJECTS IN A GRAPH DATA STREAM
    29.
    发明申请
    METHOD FOR CLASSIFICATION OF OBJECTS IN A GRAPH DATA STREAM 有权
    在图形数据流中分类对象的方法

    公开(公告)号:US20120054129A1

    公开(公告)日:2012-03-01

    申请号:US12871168

    申请日:2010-08-30

    Applicant: Charu Aggarwal

    Inventor: Charu Aggarwal

    CPC classification number: G06N99/005

    Abstract: A method for classifying objects in a graph data stream, including receiving a training stream of graph data, the training stream including a plurality of objects along with class labels that are associated with each of the objects, first determining discriminating sets of edges in the training stream for the class labels, wherein a discriminating set of edges is one that is indicative of the object that contains these edges having a given class label, receiving an incoming data stream of the graph data, wherein class labels have not yet been assigned to objects in the incoming data stream, second determining, based on the discriminating sets of edges, class labels that are associated with the objects in the incoming data stream; and outputting to an information repository object class label pairs based on the second determining.

    Abstract translation: 一种用于对图形数据流中的对象进行分类的方法,包括接收图形数据的训练流,训练流包括多个对象以及与每个对象相关联的类标签,首先确定训练中的边缘识别集合 用于类标签的流,其中,鉴别集合的边是指示包含具有给定类标签的这些边的对象,接收图数据的输入数据流,其中类标签尚未被分配给对象 在输入数据流中,基于所识别的边缘集合,第二确定与输入数据流中的对象相关联的类标签; 以及基于所述第二确定将信息输出到信息库对象类标签对。

    GRAPHICAL MODELS FOR REPRESENTING TEXT DOCUMENTS FOR COMPUTER ANALYSIS
    30.
    发明申请
    GRAPHICAL MODELS FOR REPRESENTING TEXT DOCUMENTS FOR COMPUTER ANALYSIS 有权
    用于表示计算机分析的文本文档的图形模型

    公开(公告)号:US20110302168A1

    公开(公告)日:2011-12-08

    申请号:US12796266

    申请日:2010-06-08

    Applicant: Charu Aggarwal

    Inventor: Charu Aggarwal

    CPC classification number: G06F17/30619

    Abstract: In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.

    Abstract translation: 在用图形模型表示文本文档的方法中,接收包括多个有序字的文档,并创建文档的图形数据结构。 图形数据结构包括多个节点和边缘,其中每个节点表示文档中的不同字,每个边缘标识两个节点彼此之间预定距离内发生的次数。 图形数据结构存储在信息库中。

Patent Agency Ranking