Method and apparatus for processing data streams
    41.
    发明授权
    Method and apparatus for processing data streams 失效
    用于处理数据流的方法和装置

    公开(公告)号:US07739284B2

    公开(公告)日:2010-06-15

    申请号:US11110079

    申请日:2005-04-20

    IPC分类号: G06F7/00

    摘要: A technique for processing a data stream includes the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure maybe used for classification of data in the data stream.

    摘要翻译: 一种用于处理数据流的技术包括以下步骤/操作。 保持表示数据流中的一个或多个簇的簇结构。 使用集群结构中的数据点为一个或多个集群中的每一个确定一组投影尺寸。 使用与每个一个或多个聚类的每一组的每个投影维度相关联的距离来确定数据流的输入数据点到一个或多个聚类的分配。 此外,集群结构可以用于数据流中的数据分类。

    Methods and apparatus for generating decision trees with discriminants and employing same in data classification
    42.
    发明授权
    Methods and apparatus for generating decision trees with discriminants and employing same in data classification 有权
    用于生成具有歧视性的决策树并在数据分类中采用相同的方法和装置

    公开(公告)号:US07716154B2

    公开(公告)日:2010-05-11

    申请号:US11841221

    申请日:2007-08-20

    IPC分类号: G06N5/00

    摘要: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

    摘要翻译: 提供了用于使用线性判别分析生成决策树并且在分类(也称为分类))中实现这样的决策树的方法和装置。 数据优选地以多维对象的形式,例如包括决策树生成模式中的特征变量和类变量的数据记录,以及仅包括决策树遍历模式中的特征变量的数据记录。 例如,这种创造性的方法创建更有效的监督分类系统。 通常,本发明包括分解决策树,递归地分割,使得实现训练数据的类值之间的最大分离量。 这是通过找到变量的有效组合来实现的,以便递归地分割训练数据并创建决策树。 然后使用决策树对输入测试数据进行分类。

    SYSTEMS AND METHODS FOR COMPUTATION OF OPTIMAL DISTANCE BOUNDS ON COMPRESSED TIME-SERIES DATA
    43.
    发明申请
    SYSTEMS AND METHODS FOR COMPUTATION OF OPTIMAL DISTANCE BOUNDS ON COMPRESSED TIME-SERIES DATA 有权
    用于计算压缩时间序列数据的最佳距离边界的系统和方法

    公开(公告)号:US20090204574A1

    公开(公告)日:2009-08-13

    申请号:US12027294

    申请日:2008-02-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30548 G06F2216/03

    摘要: There are provided a method and a system for computation of optimal distance bounds on compressed time-series data. In a method for similarity search, the method includes the step of transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data. The method further includes the step of computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint. The first constraint is that a sum of squares of the omitted coefficients is less than a sum of the energy of the omitted coefficients. The second constraint is that the energy of the omitted coefficients is less than the energy of a lowest energy one of the top-k coefficients.

    摘要翻译: 提供了一种用于在压缩时间序列数据上计算最佳距离界限的方法和系统。 在相似搜索的方法中,该方法包括将序列数据变换为由序列数据的顶部k个系数表示的压缩序列和序列数据的省略系数的能量之和的步骤。 该方法还包括在给定第一和第二约束的情况下,计算查询序列和压缩序列之间的距离范围上的下限和上限中的至少一个的步骤。 第一个约束是省略的系数的平方和小于所省略的系数的能量之和。 第二个约束是省略的系数的能量小于顶部k系数中最低能量的能量。

    Apparatus for dynamic classification of data in evolving data stream
    44.
    发明授权
    Apparatus for dynamic classification of data in evolving data stream 失效
    用于在演进数据流中数据的动态分类的装置

    公开(公告)号:US07487167B2

    公开(公告)日:2009-02-03

    申请号:US11756227

    申请日:2007-05-31

    IPC分类号: G06F7/00

    摘要: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

    摘要翻译: 提供了一种从测试数据流中分类数据的技术。 接收具有类标签的训练数据流。 确定并存储训练数据的一个或多个类特定的簇。 测试数据流的至少一个测试实例使用一个或多个类特定簇进行分类。

    Methods and Apparatus for Outlier Detection for High Dimensional Data Sets
    45.
    发明申请
    Methods and Apparatus for Outlier Detection for High Dimensional Data Sets 有权
    用于高维数据集异常检测的方法和装置

    公开(公告)号:US20080234977A1

    公开(公告)日:2008-09-25

    申请号:US12134371

    申请日:2008-06-06

    IPC分类号: G06F17/18

    CPC分类号: G06K9/6284

    摘要: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

    摘要翻译: 通过确定稀疏的低维投影,为数据库中的异常值检测提供了方法和装置。 这些稀疏投影用于确定哪些点是异常值。 本发明的方法在提供用于数据的高维域的异常或异常值的新颖定义方面非常重要。

    Methods and apparatus for privacy preserving data mining using statistical condensing approach
    46.
    发明授权
    Methods and apparatus for privacy preserving data mining using statistical condensing approach 有权
    使用统计冷凝方法保护数据挖掘隐私的方法和设备

    公开(公告)号:US07302420B2

    公开(公告)日:2007-11-27

    申请号:US10641935

    申请日:2003-08-14

    IPC分类号: G06F17/30

    摘要: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.

    摘要翻译: 提供了用于从与数据挖掘过程相关联使用的至少一个输入数据集生成至少一个输出数据集的方法和装置。 首先,从至少一个输入数据集构建数据统计。 然后,从数据统计生成输出数据集。 输出数据集与输入数据集不同,但保持与输入数据集内的一个或多个相关。 相关性可以是多维输入数据集的不同维度之间的固有相关性。 可以隐藏来自输入数据集的大量信息,从而可以增加数据挖掘过程的隐私级别。

    Dynamic customized web tours
    47.
    发明授权
    Dynamic customized web tours 失效
    动态定制网络旅游

    公开(公告)号:US06572662B2

    公开(公告)日:2003-06-03

    申请号:US09079661

    申请日:1998-05-15

    IPC分类号: G06F1500

    CPC分类号: G06F17/30873 G06F2216/07

    摘要: An interactive and dynamically customizable guided tour of some portion of the World Wide Web monitors and dynamically adapts in response to like-minded users as well as provides recommendations during the traversal. The invention includes features for: electronic commerce; side trips; true visiting of Web sites; maps; pre-fetching of Web objects; insertion of interactive decision points; customized insertion of advertisements; simultaneous traversal of multiple hyperpaths; collection of and dynamic modification of a tour based on collected route information and/or touring statistics.

    摘要翻译: 互动和动态可定制的万维网监视器部分的导览,并根据志同道合的用户进行动态调整,并在遍历期间提供建议。 本发明包括:电子商务; 侧行 真实访问网站; 地图; 预取Web对象; 插入交互式决策点; 定制插入广告; 同时遍历多个超路径; 基于收集到的路线信息和/或旅游统计信息,对旅游的收集和动态修改。

    Methods and apparatus for similarity text search based on conceptual indexing
    48.
    发明授权
    Methods and apparatus for similarity text search based on conceptual indexing 有权
    基于概念索引的相似文本搜索的方法和装置

    公开(公告)号:US06542889B1

    公开(公告)日:2003-04-01

    申请号:US09493811

    申请日:2000-01-28

    IPC分类号: G06F1730

    摘要: In one aspect of the invention, a method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.

    摘要翻译: 在本发明的一个方面,一种执行概念相似性搜索的方法包括以下步骤:从在概念相似搜索中使用的一个或多个文档生成一个或多个概念性的单词链; 用一个或多个单词链构建文档的概念索引; 并使用概念索引评估相似性查询。 评估步骤优选地返回由搜索产生的一个或多个最接近的文档; 一个或多个文档中的一个或多个匹配的单词链; 以及一个或多个文档的一个或多个匹配的主题词。

    System and method for generating taxonomies with applications to content-based recommendations
    49.
    发明授权
    System and method for generating taxonomies with applications to content-based recommendations 有权
    用于生成基于内容的建议的分类法的系统和方法

    公开(公告)号:US06360227B1

    公开(公告)日:2002-03-19

    申请号:US09240231

    申请日:1999-01-29

    IPC分类号: G06F1700

    摘要: A graph taxonomy of information which is represented by a plurality of vectors is generated. The graph taxonomy includes a plurality of nodes and a plurality of edges. The plurality of nodes is generated, and each node of the plurality of nodes is associated with ones of the plurality of vectors. A tree hierarchy is established based on the plurality of nodes. A plurality of distances between ones of the plurality of nodes is calculated. Ones of the plurality of nodes are connected with other ones of the plurality of nodes by ones of the plurality of edges based on the plurality of distances. The information represented by the plurality of vectors may be, for example, a plurality of documents such as Web Pages.

    摘要翻译: 生成由多个向量表示的信息的图分类法。 图形分类法包括多个节点和多个边缘。 生成多个节点,并且多个节点中的每个节点与多个向量中的每个节点相关联。 基于多个节点建立树层次结构。 计算多个节点中的多个节点之间的多个距离。 基于多个距离,多个节点中的一个与多个节点中的其他节点通过多个边缘中的一个连接。 由多个向量表示的信息可以是例如多个文档,例如网页。

    Finding collective baskets and inference rules for internet mining
    50.
    发明授权
    Finding collective baskets and inference rules for internet mining 失效
    寻找网络挖掘的集体篮子和推理规则

    公开(公告)号:US06263327B1

    公开(公告)日:2001-07-17

    申请号:US09522723

    申请日:2000-03-10

    IPC分类号: G06F1700

    摘要: A computerized method of online mining of inference rules in a large database. The method is comprised of two stages, a preprocessing stage followed by an online rule generation stage. The pro-processing stage is further defined to be a two step process that involves the generation of large itemsets. The present method defines large itemsets by how the items in the itemsets relate to each other rather than their level of presence. The measure by which itemsets are said to relate to each other is defined by a computed figure of merit, K1. The first substep of the preprocessing stage involves finding those itemsets that possess a minimum computer collective strength of K1. From those found itemsets, a second user supplied input, K2 is used to prune those itemsets with inference strength below K2.

    摘要翻译: 一种在大型数据库中在线挖掘推理规则的计算机化方法。 该方法由两个阶段组成,一个预处理阶段,随后是在线规则生成阶段。 前处理阶段被进一步定义为涉及生成大项目集的两步过程。 本方法通过项目集中的项目相互关联而不是其存在级别来定义大项目集。 项目集被称为相互关联的措施由计算出的品质因数K1定义。 预处理阶段的第一个子步骤是找到具有最小计算机集体实力K1的项目集。 从那些找到的项目集中,第二个用户提供输入,K2用于修剪低于K2的推理强度的项目集。