Method and apparatus for aggregation in uncertain data
    21.
    发明授权
    Method and apparatus for aggregation in uncertain data 有权
    在不确定数据中聚合的方法和装置

    公开(公告)号:US08005839B2

    公开(公告)日:2011-08-23

    申请号:US12039076

    申请日:2008-02-28

    CPC classification number: G06F17/30489

    Abstract: Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.

    Abstract translation: 公开了用于在数据处理系统中的不确定数据中聚合的技术。 例如,涉及不确定数据集的应用程序中的聚合方法包括以下步骤。 获得不确定性数据集以及不确定性信息。 从数据集构建一个或多个数据点簇。 存储一个或多个聚类和不确定性信息的聚合统计信息。 数据集可以是来自数据流的数据。 实现在诸如数据挖掘过程的应用中使用甚至适度的不确定性信息足以大大提高底层结果的质量。

    Mechanisms for Privately Sharing Semi-Structured Data
    22.
    发明申请
    Mechanisms for Privately Sharing Semi-Structured Data 有权
    私有分享半结构数据的机制

    公开(公告)号:US20110078143A1

    公开(公告)日:2011-03-31

    申请号:US12568976

    申请日:2009-09-29

    CPC classification number: G06F17/30539 G06F17/30598

    Abstract: Mechanisms are provided for anonymizing data comprising a plurality of graph data sets. The mechanisms receive input data comprising a plurality of graph data sets. Each graph data set comprises data for generating a separate graph from graphs associated with other graph data sets. The mechanisms perform clustering on the graph data sets to generate a plurality of clusters. At least one cluster of the plurality of clusters comprises a plurality of graph data sets. Other clusters in the plurality of clusters comprise one or more graph data sets. The mechanisms also determine, for each cluster in the plurality of clusters, aggregate properties of the cluster. Moreover, the mechanisms generate, for each cluster in the plurality of clusters, pseudo-synthetic data representing the cluster, from the determined aggregate properties of the clusters.

    Abstract translation: 提供了用于对包括多个图形数据集的数据进行匿名化的机制。 机构接收包括多个图形数据集的输入数据。 每个图形数据集包括用于从与其它图形数据集相关联的图形生成单独图形的数据。 这些机制对图形数据集执行聚类以产生多个聚类。 多个群集中的至少一个群集包括多个图形数据集。 多个集群中的其他集群包括一个或多个图形数据集。 这些机制还针对多个集群中的每个集群确定集群的集合属性。 此外,从所确定的群集的聚合属性,机制针对多个群集中的每个群集生成表示群集的伪合成数据。

    Query Optimization Over Graph Data Streams
    23.
    发明申请
    Query Optimization Over Graph Data Streams 有权
    查询优化图表数据流

    公开(公告)号:US20110029571A1

    公开(公告)日:2011-02-03

    申请号:US12511627

    申请日:2009-07-29

    CPC classification number: G06F17/30979 G06F17/30958

    Abstract: An illustrative embodiment includes a method for executing a query on a graph data stream. The graph stream comprises data representing edges that connect vertices of a graph. The method comprises constructing a plurality of synopsis data structures based on at least a subset of the graph data stream. Each vertex connected to an edge represented within the subset of the graph data stream is assigned to a synopsis data structure such that each synopsis data structure represents a corresponding section of the graph. The method further comprises mapping each received edge represented within the graph data stream onto the synopsis data structure which corresponds to the section of the graph which includes that edge, and using the plurality of synopsis data structures to execute the query on the graph data stream.

    Abstract translation: 示例性实施例包括用于在图形数据流上执行查询的方法。 图形流包括表示连接图的顶点的边缘的数据。 该方法包括基于图形数据流的至少一个子集来构建多个概要数据结构。 连接到在图形数据流的子集内表示的边缘的每个顶点被分配给概要数据结构,使得每个概要数据结构表示该图的相应部分。 所述方法还包括将在图形数据流中表示的每个接收边缘映射到对应于包括该边缘的图形部分的概要数据结构,以及使用多个概要数据结构来对图形数据流执行查询。

    Method and Apparatus for Predicting Future Behavior of Data Streams
    24.
    发明申请
    Method and Apparatus for Predicting Future Behavior of Data Streams 有权
    预测数据流未来行为的方法和装置

    公开(公告)号:US20080243742A1

    公开(公告)日:2008-10-02

    申请号:US12136210

    申请日:2008-06-10

    CPC classification number: G06F17/30516 Y10S707/99931 Y10S707/99943

    Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.

    Abstract translation: 公开了通过使用数据流的当前趋势来预测数据流的未来行为的技术。 作为示例,用于预测数据流的未来行为的技术包括以下步骤/操作。 从数据流中获取统计数据。 通过使用获取的统计信息的至少一部分来生成未来时间间隔的估计统计信息。 估计统计的一部分用于在未来的时间间隔内产生一个或多个代表性的伪数据记录。 伪数据记录用于预测数据流的至少一个特性。

    Methods and apparatus for outlier detection for high dimensional data sets
    25.
    发明授权
    Methods and apparatus for outlier detection for high dimensional data sets 有权
    用于高维数据集异常检测的方法和装置

    公开(公告)号:US07395250B1

    公开(公告)日:2008-07-01

    申请号:US09686115

    申请日:2000-10-11

    CPC classification number: G06K9/6284

    Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

    Abstract translation: 通过确定稀疏的低维投影,为数据库中的异常值检测提供了方法和装置。 这些稀疏投影用于确定哪些点是异常值。 本发明的方法在提供用于数据的高维域的异常或异常值的新颖定义方面非常重要。

    SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION
    26.
    发明申请
    SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION 审中-公开
    基于特征的负载分类的系统和方法

    公开(公告)号:US20080133438A1

    公开(公告)日:2008-06-05

    申请号:US11564885

    申请日:2006-11-30

    CPC classification number: G06N20/00

    Abstract: A system and method for feature based load shedding in classification. The system includes a plurality of data sources. The plurality of data sources being configured to render independent streams of input data, such data being selectively grouped together to form a particular classification task. The system further includes a central classification server configured to analyze and execute multiple tasks, each task consisting of multiple input data. The central classification server further configured to analyze the data for knowledge-based decision-making. The central classification server being communicatively engaged via a network to the plurality of data sources. The method includes rendering independent streams of input data, such data being selectively grouped together to form a particular task. The method further includes analyzing and handling multiple tasks, each task consisting of multiple input data. The method also includes analyzing the data for knowledge-based decision-making.

    Abstract translation: 一种分类中基于特征的负载脱落的系统和方法。 该系统包括多个数据源。 多个数据源被配置为呈现独立的输入数据流,这样的数据被选择性地分组在一起以形成特定的分类任务。 该系统还包括配置成分析和执行多个任务的中央分类服务器,每个任务由多个输入数据组成。 中央分类服务器还被配置为分析用于基于知识的决策的数据。 中央分类服务器经由网络被通信地接合到多个数据源。 该方法包括呈现独立的输入数据流,这样的数据被选择性地分组在一起以形成特定的任务。 该方法还包括分析和处理多个任务,每个任务由多个输入数据组成。 该方法还包括分析基于知识的决策的数据。

    Systems and methods for condensation-based privacy in strings
    27.
    发明申请
    Systems and methods for condensation-based privacy in strings 失效
    字符串中基于冷凝的隐私的系统和方法

    公开(公告)号:US20080082566A1

    公开(公告)日:2008-04-03

    申请号:US11540406

    申请日:2006-09-30

    CPC classification number: G06F21/6245

    Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.

    Abstract translation: 使用简单的基于模板的模型,用于隐私保护字符串数据挖掘的新方法和系统。 这种基于模板的模型在实践中是有效的,并且保持字符串的重要统计特征,例如记录内距离。 这里讨论的是字符串数据的匿名化的缩合模型。 针对字符串组创建摘要统计信息,并使用这些统计信息来生成伪字符串。 可以看出,一组新的字符串的聚合行为保持关键特征,例如组合,字符串间距离的顺序以及诸如分类的数据挖掘算法的准确性。 字符串间距离的保留是许多字符串和生物应用中的关键目标,这些应用程序深深地依赖于这种距离的计算,而可以显示诸如分类的应用的准确性不受匿名过程的影响。

    Methods and apparatus for generating decision trees with discriminants and employing same in data classification
    28.
    发明授权
    Methods and apparatus for generating decision trees with discriminants and employing same in data classification 失效
    用于生成具有歧视性的决策树并在数据分类中采用相同的方法和装置

    公开(公告)号:US07310624B1

    公开(公告)日:2007-12-18

    申请号:US09562552

    申请日:2000-05-02

    CPC classification number: G06K9/6282 G06F17/3061 G06F2216/03 Y10S707/99936

    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

    Abstract translation: 提供了用于使用线性判别分析生成决策树并且在分类(也称为分类))中实现这样的决策树的方法和装置。 数据优选地以多维对象的形式,例如包括决策树生成模式中的特征变量和类变量的数据记录,以及仅包括决策树遍历模式中的特征变量的数据记录。 例如,这种创造性的方法创建更有效的监督分类系统。 通常,本发明包括分解决策树,递归地分割,使得实现训练数据的类值之间的最大分离量。 这是通过找到变量的有效组合来实现的,以便递归地分割训练数据并创建决策树。 然后使用决策树对输入测试数据进行分类。

    System and method for distributed privacy preserving data mining
    29.
    发明授权
    System and method for distributed privacy preserving data mining 有权
    分布式隐私保护数据挖掘的系统和方法

    公开(公告)号:US07305378B2

    公开(公告)日:2007-12-04

    申请号:US10892691

    申请日:2004-07-16

    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.

    Abstract translation: 提供分布式隐私保护数据挖掘技术。 分布式计算环境中的多个实体的第一实体经由隐私保护数据共享协议与多个实体中的第二实体交换摘要信息,使得保留摘要信息的隐私,与 与实体存储的数据相关的实体。 然后,第一实体可以至少基于通过隐私保护数据共享协议从第二实体获得的摘要信息来挖掘数据。 第一实体可以通过隐私保护数据共享协议从第二实体获得与特定项目集出现的交易数量有关的信息和/或与其中满足特定规则的交易数量有关的信息。

    Method and apparatus for classifying unmarked string substructures using Markov Models
    30.
    发明授权
    Method and apparatus for classifying unmarked string substructures using Markov Models 有权
    使用马尔可夫模型对未标记的字符串子结构进行分类的方法和装置

    公开(公告)号:US07139688B2

    公开(公告)日:2006-11-21

    申请号:US10600690

    申请日:2003-06-20

    CPC classification number: G06F19/24 G06F19/18 G06F19/22

    Abstract: A technique for structurally classifying substructures of at least one unmarked string utilizing at least one training data set with inserted markers identifying labeled substructures. A model of class labels and substructures within strings of the training data set is first constructed. Markers are then inserted into the unmarked string, identifying substructures similar to substructures within strings of the training data set by using the model. Finally, class labels of the substructures in the unmarked string similar to substructures within strings of the training data set are predicted using the model.

    Abstract translation: 一种技术,用于利用至少一个具有识别标记的子结构的标记的训练数据集来结构化地分类至少一个未标记的字符串的子结构。 首先构建训练数据集的字符串内的类标签和子结构的模型。 然后将标记插入到未标记的字符串中,通过使用模型识别类似于训练数据集的字符串内的子结构的子结构。 最后,使用该模型预测未标记字符串中子结构的类标签,类似于训练数据集的字符串内的子结构。

Patent Agency Ranking