专利检索 ap:("Sudipto Guha" OR "Rajeev Rastogi" OR "Kyuseok Shim") AND inv:"Sudipto Guha" 第 1 页

1.

发明授权
Method, apparatus and programmed medium for clustering databases with categorical attributes 失效
标题翻译：用于对具有分类属性的数据库进行聚类的方法，装置和程序化介质

公开(公告)号：US6049797A

公开(公告)日：2000-04-11

申请号：US55940

申请日：1998-04-07

申请人： Sudipto Guha , Rajeev Rastogi , Kyuseok Shim

发明人： Sudipto Guha , Rajeev Rastogi , Kyuseok Shim

IPC分类号： G06F17/30 , G06K9/62

CPC分类号： G06F17/30598 , G06K9/6218 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99936 , Y10S707/99942 , Y10S707/99945

摘要： The present invention relates to a computer method, apparatus and programmed medium for clustering databases containing data with categorical attributes. The present invention assigns a pair of points to be neighbors if their similarity exceeds a certain threshold. The similarity value for pairs of points can be based on non-metric information. The present invention determines a total number of links between each cluster and every other cluster bases upon the neighbors of the clusters. A goodness measure between each cluster and every other cluster based upon the total number of links between each cluster and every other cluster and the total number of points within each cluster and every other cluster is then calculated. The present invention merges the two clusters with the best goodness measure. Thus, clustering is performed accurately and efficiently by merging data based on the amount of links between the data to be clustered.

摘要翻译： 本发明涉及一种计算机方法，装置和用于对包含具有分类属性的数据进行聚类的数据库的编程介质。如果它们的相似度超过特定阈值，则本发明将一对点分配为邻居。点对的相似度值可以基于非度量信息。本发明确定每个群集与每个其他群集之间的链路的总数量，基于群集的邻居。基于每个集群和每个其他集群之间的链路总数和每个集群和每个其他集群中的总点数，然后计算每个集群和每个其他集群之间的良好度量。本发明以最佳的品质度量合并了两个群。因此，通过基于待聚集的数据之间的链接量合并数据，准确而有效地执行聚类。

2.

发明授权
Programmed medium for clustering large databases 失效
标题翻译：用于集群大数据库的程序化介质

公开(公告)号：US6092072A

公开(公告)日：2000-07-18

申请号：US55941

申请日：1998-04-07

申请人： Sudipto Guha , Rajeev Rastogi , Kyuseok Shim

发明人： Sudipto Guha , Rajeev Rastogi , Kyuseok Shim

IPC分类号： G06F17/30

CPC分类号： G06F17/30598 , G06F17/30601 , G06K9/622 , G06K9/6298 , Y10S707/968 , Y10S707/99942

摘要： The present invention relates to a computer method, apparatus and programmed medium for clustering large databases. The present invention represents each cluster to be merged by a constant number of well scattered points that capture the shape and extent of the cluster. The chosen scattered points are shrunk towards the mean of the cluster by a shrinking fraction to form a representative set of data points that efficiently represent the cluster. The clusters with the closest pair of representative points are merged to form a new cluster. The use of an efficient representation of the clusters allows the present invention to obtain improved clustering while efficiently eliminating outliers.

摘要翻译： 本发明涉及用于聚类大数据库的计算机方法，装置和编程介质。本发明表示通过捕获簇的形状和范围的恒定数量的良好散射点来合并的每个簇。所选择的散点按照缩小的分数缩小到群集的平均值，以形成有效代表群集的一组代表性的数据点。具有最接近的代表点对的集合被合并以形成新的集群。使用集群的有效表示允许本发明获得改进的聚类，同时有效地消除异常值。

3.

发明授权
Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
标题翻译：基于加权分割和征服的计算机实现可扩展，增量和并行聚类

公开(公告)号：US06907380B2

公开(公告)日：2005-06-14

申请号：US10726254

申请日：2003-12-01

申请人： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani

发明人： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani

IPC分类号： G06K9/62 , G06F101/14 , G06F17/18 , G06F17/30

CPC分类号： G06K9/6218 , Y10S707/99936 , Y10S707/99937

摘要： A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

摘要翻译： 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。该技术包括：1）将集合S划分成P个不相交的部分S 1。。。，S 2）对于每个块S i确定k个中间中心的集合D i i i i， 3）将每个片段S i中的每个数据点分配给k个中间中心中最接近的一个; 4）通过分配给该中心的相应片段S i i中的点的数量对每个集合D i i i中的每个k个中间中心进行加权; 和5）将加权中间体聚类在一起以找到所述k个最终中心，使用特定的误差度量和聚类方法A进行聚类。

4.

发明授权
Method and apparatus for using histograms to produce data summaries 有权
标题翻译：使用直方图产生数据摘要的方法和装置

公开(公告)号：US07965643B1

公开(公告)日：2011-06-21

申请号：US12217958

申请日：2008-07-10

申请人： Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Ioannis Kotidis , Shanmugavelayutham Muthukrishnan , Martin J. Strauss

发明人： Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Ioannis Kotidis , Shanmugavelayutham Muthukrishnan , Martin J. Strauss

IPC分类号： H04J1/16

CPC分类号： H04L43/045 , H04L63/1408

摘要： A system and method are provided for summarizing dynamic data from distributed sources through the use of histograms. In particular, the method comprises receiving a first data signal at a first location, determining a first array sketch of the first data signal, and constructing a first output histogram from the first array sketch and a first robust histogram via a first hybrid histogram. Array sketches of a number of data signals may be calculated, and added to yield a single vector sum. The histogram is constructed from the vector sum. In that way, the vector sum may be analyzed without revealing the individual data signals that form the basis of the sum.

摘要翻译： 提供了一种通过使用直方图从分布式源汇总动态数据的系统和方法。特别地，该方法包括在第一位置处接收第一数据信号，确定第一数据信号的第一阵列草图，以及经由第一混合直方图从第一阵列草图和第一稳健直方图构造第一输出直方图。可以计算多个数据信号的阵列草图，并将其加到以产生单个向量和。直方图由向量和构成。以这种方式，可以分析矢量和，而不会泄露构成和的基础的各个数据信号。

5.

发明授权
Method and apparatus for optimizing queries under parametric aggregation constraints 失效
标题翻译：用于在参数聚合约束下优化查询的方法和装置

公开(公告)号：US07904458B2

公开(公告)日：2011-03-08

申请号：US12647489

申请日：2009-12-26

申请人： Nikolaos Koudas , Divesh Srivastava , Sudipto Guha , Dimitrios Gunopulos , Michail Vlachos

发明人： Nikolaos Koudas , Divesh Srivastava , Sudipto Guha , Dimitrios Gunopulos , Michail Vlachos

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30469 , Y10S707/99932

摘要： The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints.

摘要翻译： 本发明涉及一种优化查询的方法和装置。本发明公开了一种用于在参数聚合约束下提供查询的答案的有效方法。

6.

发明授权
Apparatus and method for correlating synchronous and asynchronous data streams 有权
标题翻译：用于关联同步和异步数据流的装置和方法

公开(公告)号：US08131792B1

公开(公告)日：2012-03-06

申请号：US12125973

申请日：2008-05-23

申请人： Nikolaos Koudas , Sudipto Guha

发明人： Nikolaos Koudas , Sudipto Guha

IPC分类号： G06F17/15 , G06F11/00 , G06F12/14 , G06F12/16 , G08B23/00

CPC分类号： G06K9/00536

摘要： Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.

摘要翻译： 某些示例性实施例提供了一种方法，包括：自动地：接收多个连续数据流中的每一个的多个元素; 将所述多个元素作为限定第一维度的第一数据流矩阵; 减少第一数据流矩阵的第一维度以获得第二数据流矩阵; 计算第二数据流矩阵的奇异值分解; 并且基于第二数据流矩阵的奇异值分解，量化多个元素之间的近似线性相关性。

7.

发明授权
Method and apparatus for using histograms to produce data summaries 失效
标题翻译：使用直方图产生数据摘要的方法和装置

公开(公告)号：US07177282B1

公开(公告)日：2007-02-13

申请号：US10114655

申请日：2002-04-02

申请人： Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Ioannis Kotidis , Shanmugavelayutham Muthukrishnan , Martin J. Strauss

发明人： Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Ioannis Kotidis , Shanmugavelayutham Muthukrishnan , Martin J. Strauss

IPC分类号： H04J1/16 , H04J3/14

CPC分类号： H04L43/06 , H04L43/045 , Y10S707/99934 , Y10S707/99943

摘要： A system and method are provided for monitoring dynamic data from distributed sources through the use of histograms. In the method, an array sketch of the digital signal is determined, a robust histogram is constructed from the array sketch, and an output histogram is constructed from the array sketch and the robust histogram via a hybrid histogram. Dyadic intervals of a representation of the array sketch are used in constructing the robust histogram.

摘要翻译： 提供了一种系统和方法，用于通过使用直方图来监视来自分布式源的动态数据。在该方法中，确定数字信号的阵列草图，从阵列草图中构建鲁棒直方图，并通过混合直方图从阵列草图和鲁棒直方图构建输出直方图。数组草图表示的二维间隔用于构建鲁棒直方图。

8.

发明授权
Apparatus and method for correlating synchronous and asynchronous data streams 有权
标题翻译：用于关联同步和异步数据流的装置和方法

公开(公告)号：US07437397B1

公开(公告)日：2008-10-14

申请号：US10822316

申请日：2004-04-12

申请人： Nikolaos Koudas , Sudipto Guha

发明人： Nikolaos Koudas , Sudipto Guha

IPC分类号： G06F17/15

CPC分类号： G06K9/00536

摘要： Certain exemplary embodiments provide a method comprising: automatically: receiving a plurality of elements for each of a plurality of continuous data streams; treating the plurality of elements as a first data stream matrix that defines a first dimensionality; reducing the first dimensionality of the first data stream matrix to obtain a second data stream matrix; computing a singular value decomposition of the second data stream matrix; and based on the singular value decomposition of the second data stream matrix, quantifying approximate linear correlations between the plurality of elements.

摘要翻译： 某些示例性实施例提供了一种方法，包括：自动地：接收多个连续数据流中的每一个的多个元素; 将所述多个元素作为限定第一维度的第一数据流矩阵; 减少第一数据流矩阵的第一维度以获得第二数据流矩阵; 计算第二数据流矩阵的奇异值分解; 并且基于第二数据流矩阵的奇异值分解，量化多个元素之间的近似线性相关性。

9.

发明授权
Apparatus and method for merging results of approximate matching operations 有权
标题翻译：用于合并近似匹配操作结果的装置和方法

公开(公告)号：US07415461B1

公开(公告)日：2008-08-19

申请号：US11195888

申请日：2005-08-03

申请人： Sudipto Guha , Nikolas Koudas , Amit Marathe , Divesh Srivastava

发明人： Sudipto Guha , Nikolas Koudas , Amit Marathe , Divesh Srivastava

IPC分类号： G06F7/00

CPC分类号： G06F17/30696 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99937

摘要： A device and a method are provided. Approximate match operations are performed for each of a group of attributes for each of a group of tuples with respect to a query to create a respective ranking for each of the group of attributes. The rankings of the group of attributes are combined to provide a ranking score for each of the group of tuples. Data representing a ranking score of each of the group of tuples is generated according to a position of a respective ranking of each one of the group of tuples for a first k positions of the ranking. K of top ranked ones of the group of tuples are identified based at least in part on the generated data, wherein a number of the group of tuples is n and k

摘要翻译： 提供了一种设备和方法。对于关于查询的一组元组中的每一个的一组属性中的每一个执行近似匹配操作，以为该属性组中的每一个创建相应的排名。组合属性的排名被组合以提供每组元组的排名得分。根据排序的第一k个位置的组元组中的每一个的相应排名的位置来生成表示每组元组的排名得分的数据。至少部分地基于所生成的数据来识别组元组中的顶级排名的K，其中该组元组的数目为n且k

10.

发明申请
METHOD AND APPARATUS FOR OPTIMIZING QUERIES UNDER PARAMETRIC AGGREGATION CONSTRAINTS 审中-公开
标题翻译：参数参数约束下优化查询的方法与装置

公开(公告)号：US20080052268A1

公开(公告)日：2008-02-28

申请号：US11927100

申请日：2007-10-29

申请人： NIKOLAOS KOUDAS , Divesh Srivastava , Sudipto Guha , Dimitrios Gunopulos , Michail Vlachos

发明人： NIKOLAOS KOUDAS , Divesh Srivastava , Sudipto Guha , Dimitrios Gunopulos , Michail Vlachos

IPC分类号： G06F17/30

CPC分类号： G06F16/24545 , Y10S707/99932

摘要： The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints.

摘要翻译： 本发明涉及一种优化查询的方法和装置。本发明公开了一种用于在参数聚合约束下提供查询的答案的有效方法。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类