Methods and Apparatus for Outlier Detection for High Dimensional Data Sets
    11.
    发明申请
    Methods and Apparatus for Outlier Detection for High Dimensional Data Sets 有权
    用于高维数据集异常检测的方法和装置

    公开(公告)号:US20080234977A1

    公开(公告)日:2008-09-25

    申请号:US12134371

    申请日:2008-06-06

    CPC classification number: G06K9/6284

    Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

    Abstract translation: 通过确定稀疏的低维投影,为数据库中的异常值检测提供了方法和装置。 这些稀疏投影用于确定哪些点是异常值。 本发明的方法在提供用于数据的高维域的异常或异常值的新颖定义方面非常重要。

    Method and apparatus for predicting future behavior of data streams
    12.
    发明申请
    Method and apparatus for predicting future behavior of data streams 有权
    用于预测数据流未来行为的方法和装置

    公开(公告)号:US20070294216A1

    公开(公告)日:2007-12-20

    申请号:US11452585

    申请日:2006-06-14

    CPC classification number: G06F17/30516 Y10S707/99931 Y10S707/99943

    Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.

    Abstract translation: 公开了通过使用数据流的当前趋势来预测数据流的未来行为的技术。 作为示例,用于预测数据流的未来行为的技术包括以下步骤/操作。 从数据流中获取统计数据。 通过使用获取的统计信息的至少一部分来生成未来时间间隔的估计统计信息。 估计统计的一部分用于在未来的时间间隔内产生一个或多个代表性的伪数据记录。 伪数据记录用于预测数据流的至少一个特性。

    Methods and apparatus for privacy preserving data mining using statistical condensing approach
    13.
    发明授权
    Methods and apparatus for privacy preserving data mining using statistical condensing approach 有权
    使用统计冷凝方法保护数据挖掘隐私的方法和设备

    公开(公告)号:US07302420B2

    公开(公告)日:2007-11-27

    申请号:US10641935

    申请日:2003-08-14

    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.

    Abstract translation: 提供了用于从与数据挖掘过程相关联使用的至少一个输入数据集生成至少一个输出数据集的方法和装置。 首先,从至少一个输入数据集构建数据统计。 然后,从数据统计生成输出数据集。 输出数据集与输入数据集不同,但保持与输入数据集内的一个或多个相关。 相关性可以是多维输入数据集的不同维度之间的固有相关性。 可以隐藏来自输入数据集的大量信息,从而可以增加数据挖掘过程的隐私级别。

    Method and apparatus for classifying time series data using wavelet based approach
    14.
    发明授权
    Method and apparatus for classifying time series data using wavelet based approach 失效
    使用基于小波的方法对时间序列数据进行分类的方法和装置

    公开(公告)号:US06871165B2

    公开(公告)日:2005-03-22

    申请号:US10601215

    申请日:2003-06-20

    CPC classification number: G06K9/00536

    Abstract: A technique for effective classification of time series data using a rule-based wavelet decomposition approach. This method is effective in classification of a wide variety of time series data sets. The process uses a combination of wavelet decomposition, discretization and rule generation of training time series data to classify various instances of test time series data. The wavelet decomposition can effectively explore the data at varying levels of granularity to classify instances of the test time series data.

    Abstract translation: 一种使用基于规则的小波分解方法对时间序列数据进行有效分类的技术。 这种方法对于各种时间序列数据集的分类是有效的。 该过程使用小波分解,离散化和规则生成训练时间序列数据的组合来分类测试时间序列数据的各种实例。 小波分解可以以不同的粒度级别有效地探索数据,对测试时间序列数据的实例进行分类。

    System and method of determining and searching for patterns in a large database
    15.
    发明授权
    System and method of determining and searching for patterns in a large database 失效
    确定和搜索大型数据库中的模式的系统和方法

    公开(公告)号:US06799175B2

    公开(公告)日:2004-09-28

    申请号:US09840652

    申请日:2001-04-23

    Abstract: Techniques are provided for finding query responses from database queries using an interactive process between a user (e.g., a person entering a query to a database) and a computer system (e.g., a computing system upon which the database resides or which has access to the database). The interactive process comprises providing the user with one or more visual perspectives as feedback on the distribution of points in the database. These visual perspectives may be considered by the user in order for the user to provide feedback to the computer system. The computer system may then use the user-provided feedback to determine the best response to the query.

    Abstract translation: 提供了用于使用用户(例如,进入数据库的查询的人员)与计算机系统(例如,数据库所在的计算系统或可访问数据库的计算系统)之间的交互过程来查找来自数据库查询的查询响应的技术 数据库)。 交互过程包括向用户提供一个或多个可视化视角作为对数据库中的点分布的反馈。 用户可以考虑这些视觉观点,以便用户向计算机系统提供反馈。 计算机系统然后可以使用用户提供的反馈来确定对查询的最佳响应。

    Methods and apparatus for similarity text search based on conceptual indexing
    16.
    发明授权
    Methods and apparatus for similarity text search based on conceptual indexing 有权
    基于概念索引的相似文本搜索的方法和装置

    公开(公告)号:US06542889B1

    公开(公告)日:2003-04-01

    申请号:US09493811

    申请日:2000-01-28

    Abstract: In one aspect of the invention, a method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.

    Abstract translation: 在本发明的一个方面,一种执行概念相似性搜索的方法包括以下步骤:从在概念相似搜索中使用的一个或多个文档生成一个或多个概念性的单词链; 用一个或多个单词链构建文档的概念索引; 并使用概念索引评估相似性查询。 评估步骤优选地返回由搜索产生的一个或多个最接近的文档; 一个或多个文档中的一个或多个匹配的单词链; 以及一个或多个文档的一个或多个匹配的主题词。

    System and method of generating associations
    17.
    发明授权
    System and method of generating associations 失效
    生成关联的系统和方法

    公开(公告)号:US06311179B1

    公开(公告)日:2001-10-30

    申请号:US09183410

    申请日:1998-10-30

    Abstract: A method for automatically generating associations of items included in a database. A user first specifies a support criteria indicating a strength of desired associations of items contained in the said database. Then, a recursive program is executed for generating a hierarchical tree structure comprising one or more levels of database itemsets, with each itemset representing item associations determined to have satisfied the specified support criteria. The recursive program includes steps of: characterizing nodes of the tree structure as being either active and enabling generation of new nodes at a new level of the tree, or inactive, at any given time; enabling traversal of the tree structure in a predetermined manner and projecting each of the transactions included in the database onto currently active nodes of the tree structure to generate projected transaction results; and, counting the projected transaction results of the projected transactions at the active nodes to determine whether the further itemsets satisfy the specified support criteria. All itemsets meeting the specified support criteria are added to the tree structure at a new level.

    Abstract translation: 一种用于自动生成包含在数据库中的项目的关联的方法。 用户首先指定表示包含在所述数据库中的所需项目关联强度的支持条件。 然后,执行递归程序以生成包括一个或多个级别的数据库项集的分级树结构,每个项目集表示确定满足指定支持标准的项目关联项。 递归程序包括以下步骤:将树结构的节点表征为活动状态,并使得能够在任何给定时间在树的新级别或非活动状态下生成新节点; 使得以预定方式遍历树结构,并将包括在数据库中的每个事务投影到树结构的当前活动节点上以产生预计的事务结果; 并计算在活动节点处的预计事务的预计事务结果,以确定进一步的项集是否满足指定的支持标准。 满足指定支持条件的所有项目集都将添加到树结构中。

    System and method for supervised network clustering

    公开(公告)号:US10135723B2

    公开(公告)日:2018-11-20

    申请号:US13610092

    申请日:2012-09-11

    Abstract: A method (and system) for supervised network clustering includes receiving and reading node labels from a plurality of nodes on a network, as executed by a processor on a computer having access to the network, the network defined as a group of entities interconnected by links. The node labels are used to define densities associated with the nodes. Node components are extracted from the network, based on using thresholds on densities. Smaller components having a size below a user-defined threshold are merged.

    System and method for resource adaptive classification of data streams
    19.
    发明授权
    System and method for resource adaptive classification of data streams 有权
    数据流资源自适应分类的系统和方法

    公开(公告)号:US08165979B2

    公开(公告)日:2012-04-24

    申请号:US13078419

    申请日:2011-04-01

    CPC classification number: G06K9/6282 G06N99/005

    Abstract: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.

    Abstract translation: 一种用于数据流资源自适应分类的系统和方法。 系统和方法的实施例提供在计算机中接收的分类数据,包括离散接收的数据,从所接收的数据构建中间数据结构作为训练实例,对所接收的数据进行子空间采样作为测试实例,并基于 所述子空间抽样统计。

    System and method for classifying data streams with very large cardinality
    20.
    发明授权
    System and method for classifying data streams with very large cardinality 有权
    用于分类具有非常大基数的数据流的系统和方法

    公开(公告)号:US08140448B2

    公开(公告)日:2012-03-20

    申请号:US12118405

    申请日:2008-05-09

    CPC classification number: G06N99/005 G06K9/6267

    Abstract: An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.

    Abstract translation: 识别描述该对象的对象和属性。 这些属性被分组成属性模式,并且识别分类类。 对于每个识别的类,创建包含多个并行哈希表的草图表。 对于要分类的对象,使用每个草图表的所有散列函数处理每个属性模式,从而在单个属性模式的每个草图表下产生多个值。 为每个草图表选择最低值。 对每个属性模式评估所有草图表中的值的分布,为每个属性模式产生歧视性的权力。 选择具有高于给定阈值的辨别力的属性模式并将其添加到相关联的草图表值。 识别具有最大总和的草图表,并将关联的类分配给属于属性模式的对象。

Patent Agency Ranking