Patent search ap:"Rajeev Motwani" Page 2

11.

发明授权
Database aggregation query result estimator 有权
Title translation: 数据库聚合查询结果估计器

公开(公告)号：US07293037B2

公开(公告)日：2007-11-06

申请号：US11246354

申请日：2005-10-07

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Abstract translation: 通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后对没有异常值的剩余数据进行采样，以提供统计学上相关的样本，然后对其进行聚合和外插，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。

12.

发明授权
Sampling for queries 有权
Title translation: 查询抽样

公开(公告)号：US07287020B2

公开(公告)日：2007-10-23

申请号：US09759804

申请日：2001-01-12

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.

Abstract translation: 本公开描述了利用与执行的数据库查询相关联的工作负载信息来估计当前数据库查询的结果。分析工作负载信息以确定查询执行期间数据库中元组的使用情况，例如访问元组的频率以及访问元组的不同查询的数量。一个元组被分配一个基于分析的工作量信息的权重值。为当前查询估计结果而采样的特定元组基于每个元组的权重值。还可以利用工作负载信息来生成异常值索引，该索引识别与执行的查询相关联的异常值元组，或者识别与其他查询更频繁执行的特定查询相关联的异常值元组。当前查询的结果也可以使用采样值以及来自离群值索引的异常值元组来估计。

13.

发明申请
Robust detector of fuzzy duplicates 有权

公开(公告)号：US20060053129A1

公开(公告)日：2006-03-09

申请号：US10929514

申请日：2004-08-30

Applicant: Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

Inventor： Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

IPC: G06F7/00

CPC classification number: G06F17/30303 , Y10S707/99932 , Y10S707/99933 , Y10S707/99937 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

14.

发明授权
Sampling for aggregation queries 有权
Title translation: 聚合查询的抽样

公开(公告)号：US06842753B2

公开(公告)日：2005-01-11

申请号：US09759799

申请日：2001-01-12

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Abstract translation: 通过首先识别异常值，聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。采样数据被外推并加到聚合异常值中，以提供每个聚合查询的估计。异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。为异常值创建索引。离群数据从数据窗口中移除，并单独汇总。然后以许多已知方式之一对剩余的没有异常值的数据进行采样，以提供统计学相关的样本，然后进行聚合和外推，以提供剩余数据的估计。该采样估计与异常值聚合组合以形成整套数据的估计。进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

15.

发明授权
Sampling for database systems 失效

公开(公告)号：US06532458B1

公开(公告)日：2003-03-11

申请号：US09268590

申请日：1999-03-15

Applicant: Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

Inventor： Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

IPC: G06F1730

CPC classification number: G06F17/30536 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

16.

发明授权
Sampling for database systems 失效
Title translation: 数据库系统的抽样

公开(公告)号：US07567949B2

公开(公告)日：2009-07-28

申请号：US10238175

申请日：2002-09-10

Applicant: Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

Inventor： Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

IPC: G06F17/30 , G06F7/00

CPC classification number: G06F17/30536 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

Abstract translation: 数据库服务器根据期望的抽样语义（例如替换（WR），无替换（WoR）或独立硬币翻转（CF））语义支持对记录或元组进行加权和未加权采样。数据库服务器可以顺序地执行这样的采样，以便例如非查询记录例如在查询树中由流水线生成的非实体记录，但是也可以在一次通过中对采样记录（无论是否实现）进行采样。数据库服务器还支持对两个记录或元组关系的连接进行抽样，而不需要计算完整连接，而不需要在关系的连接属性值上实现关系和/或索引。

17.

发明授权
Robust detector of fuzzy duplicates 有权
Title translation: 强大的模糊检测器

公开(公告)号：US07516149B2

公开(公告)日：2009-04-07

申请号：US10929514

申请日：2004-08-30

Applicant: Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

Inventor： Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

IPC: G06F7/00 , G06F17/00

CPC classification number: G06F17/30303 , Y10S707/99932 , Y10S707/99933 , Y10S707/99937 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Abstract translation: 本文描述的至少一个实施例检测模糊重复并消除这种重复。模糊重复是代表相同的真实世界实体或现象的数据库中的多个看似独特的元组（即，记录）。

18.

发明授权
Sampling for queries 有权
Title translation: 查询抽样

公开(公告)号：US07493316B2

公开(公告)日：2009-02-17

申请号：US11296036

申请日：2005-12-07

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

Abstract translation: 一种估计数据库查询结果的方法，通过基于在执行工作负载中所需的元组的使用概率对数据库中的加权元组进行抽样来估计结果。每个元组采样的概率相关。根据每个采样元组中的值计算聚合，同时乘以与每个元组采样相关联的概率的反转。

19.

发明申请
Sampling for queries 有权

公开(公告)号：US20060085463A1

公开(公告)日：2006-04-20

申请号：US11296034

申请日：2005-12-07

Applicant: Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

Inventor： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

IPC: G06F7/00

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

20.

发明申请
Database aggregation query result estimator 有权

公开(公告)号：US20060036600A1

公开(公告)日：2006-02-16

申请号：US11246355

申请日：2005-10-07

Applicant: Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

Inventor： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

IPC: G06F7/00

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification