Patent search ap:("Surajit Chaudhuri" OR "Rajeev Motwani" OR "Vivek Narasayya") AND inv:"Rajeev Motwani" Page 2

11.

发明授权
Robust detector of fuzzy duplicates 有权
Title translation: 强大的模糊检测器

公开(公告)号：US07516149B2

公开(公告)日：2009-04-07

申请号：US10929514

申请日：2004-08-30

Applicant: Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

Inventor： Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

IPC: G06F7/00 , G06F17/00

CPC classification number: G06F17/30303 , Y10S707/99932 , Y10S707/99933 , Y10S707/99937 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Abstract translation: 本文描述的至少一个实施例检测模糊重复并消除这种重复。模糊重复是代表相同的真实世界实体或现象的数据库中的多个看似独特的元组（即，记录）。

12.

发明授权
Sampling for queries 有权
Title translation: 查询抽样

公开(公告)号：US07493316B2

公开(公告)日：2009-02-17

申请号：US11296036

申请日：2005-12-07

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

Abstract translation: 一种估计数据库查询结果的方法，通过基于在执行工作负载中所需的元组的使用概率对数据库中的加权元组进行抽样来估计结果。每个元组采样的概率相关。根据每个采样元组中的值计算聚合，同时乘以与每个元组采样相关联的概率的反转。

13.

发明授权
Sampling for queries 有权
Title translation: 查询抽样

公开(公告)号：US07577638B2

公开(公告)日：2009-08-18

申请号：US11296034

申请日：2005-12-07

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

Abstract translation: 数据库和给定工作负荷的异常值索引是通过识别由工作负载中的查询中的选择引起的数据库中的元数据元素的子关系而生成的。然后为每个子关系中的值生成方差。选择具有较高方差的子关系，并且产生具有较高方差的这种子关系的异常值。

14.

发明授权
Database aggregation query result estimator 有权
Title translation: 数据库聚合查询结果估计器

公开(公告)号：US07293037B2

公开(公告)日：2007-11-06

申请号：US11246354

申请日：2005-10-07

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Abstract translation: 通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后对没有异常值的剩余数据进行采样，以提供统计学上相关的样本，然后对其进行聚合和外插，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。

15.

发明授权
Sampling for queries 有权
Title translation: 查询抽样

公开(公告)号：US07287020B2

公开(公告)日：2007-10-23

申请号：US09759804

申请日：2001-01-12

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.

Abstract translation: 本公开描述了利用与执行的数据库查询相关联的工作负载信息来估计当前数据库查询的结果。分析工作负载信息以确定查询执行期间数据库中元组的使用情况，例如访问元组的频率以及访问元组的不同查询的数量。一个元组被分配一个基于分析的工作量信息的权重值。为当前查询估计结果而采样的特定元组基于每个元组的权重值。还可以利用工作负载信息来生成异常值索引，该索引识别与执行的查询相关联的异常值元组，或者识别与其他查询更频繁执行的特定查询相关联的异常值元组。当前查询的结果也可以使用采样值以及来自离群值索引的异常值元组来估计。

16.

发明申请
Robust detector of fuzzy duplicates 有权

公开(公告)号：US20060053129A1

公开(公告)日：2006-03-09

申请号：US10929514

申请日：2004-08-30

Applicant: Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

Inventor： Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

IPC: G06F7/00

CPC classification number: G06F17/30303 , Y10S707/99932 , Y10S707/99933 , Y10S707/99937 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

17.

发明授权
Sampling for aggregation queries 有权
Title translation: 聚合查询的抽样

公开(公告)号：US06842753B2

公开(公告)日：2005-01-11

申请号：US09759799

申请日：2001-01-12

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Abstract translation: 通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后以许多已知方式之一对剩余的没有异常值的数据进行采样，以提供统计学相关的样本，然后进行聚合和外推，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

18.

发明授权
Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
Title translation: 基于加权分割和征服的计算机实现可扩展，增量和并行聚类

公开(公告)号：US06907380B2

公开(公告)日：2005-06-14

申请号：US10726254

申请日：2003-12-01

Applicant: Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani

Inventor： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani

IPC: G06K9/62 , G06F101/14 , G06F17/18 , G06F17/30

CPC classification number: G06K9/6218 , Y10S707/99936 , Y10S707/99937

Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

Abstract translation: 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。该技术包括：1）将集合S划分成P个不相交的部分S 1。。。，S 2）对于每个块S i确定k个中间中心的集合D i i i i， 3）将每个片段S i中的每个数据点分配给k个中间中心中最接近的一个; 4）通过分配给该中心的相应片段S i i中的点的数量对每个集合D i i i中的每个k个中间中心进行加权; 和5）将加权中间体聚类在一起以找到所述k个最终中心，使用特定的误差度量和聚类方法A进行聚类。

19.

发明授权
Database aggregation query result estimator 有权
Title translation: 数据库聚合查询结果估计器

公开(公告)号：US07191181B2

公开(公告)日：2007-03-13

申请号：US10873569

申请日：2004-06-22

Applicant: Sarajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Sarajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Abstract translation: 通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后以许多已知方式之一对剩余的没有异常值的数据进行采样，以提供统计学相关的样本，然后进行聚合和外推，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

20.

发明授权
Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
Title translation: 基于加权分割和征服的计算机实现可扩展，增量和并行聚类

公开(公告)号：US06684177B2

公开(公告)日：2004-01-27

申请号：US09854212

申请日：2001-05-10

Applicant: Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani

Inventor： Nina Mishra , Liadan O'Callaghan , Sudipto Guha , Rajeev Motwani

IPC: G06F10114

CPC classification number: G06K9/6218 , Y10S707/99936 , Y10S707/99937

Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , SP; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

Abstract translation: 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。该技术包括：1）将集合S划分成P个不相交的部分S1。。。，SP; 2）对于每个块Si，确定k个中心的集合Di; 3）将每个片段Si中的每个数据点分配给k个中间的最近的一个; 4）通过分配给该中心的相应片段Si中的点的数量对每个集合Di中的每个k个中间中心进行加权; 和5）将加权中间体聚类在一起以找到所述k个最终中心，使用特定的误差度量和聚类方法A进行聚类。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification