Patent search ap:("Arvind Arasu" OR "Surajit Chaudhuri") AND inv:"Surajit Chaudhuri" Page 6

51.

发明授权
Robust detector of fuzzy duplicates 有权
Title translation: 强大的模糊检测器

公开(公告)号：US07516149B2

公开(公告)日：2009-04-07

申请号：US10929514

申请日：2004-08-30

Applicant: Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

Inventor： Rajeev Motwani , Surajit Chaudhuri , Venkatesh Ganti

IPC: G06F7/00 , G06F17/00

CPC classification number: G06F17/30303 , Y10S707/99932 , Y10S707/99933 , Y10S707/99937 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Abstract translation: 本文描述的至少一个实施例检测模糊重复并消除这种重复。模糊重复是代表相同的真实世界实体或现象的数据库中的多个看似独特的元组（即，记录）。

52.

发明申请
KEYWORD SEARCH OVER HEAVY-TAILED DATA AND MULTI-KEYWORD QUERIES 审中-公开
Title translation: 关键字搜索超重数据和多关键字查询

公开(公告)号：US20090083214A1

公开(公告)日：2009-03-26

申请号：US11858920

申请日：2007-09-21

Applicant: Arnd C. Konig , Surajit Chaudhuri , Kenneth Church , Liying Sui

Inventor： Arnd C. Konig , Surajit Chaudhuri , Kenneth Church , Liying Sui

IPC: G06F17/30

CPC classification number: G06F16/3331 , G06F16/313

Abstract: Index structures and query processing framework that enforces a given threshold on the overhead of computing conjunctive keyword queries. This includes a keyword processing algorithm, logic to determine which indexes to materialize, and a probabilistic approach to reducing the overhead for determining which indexes to build. The index structures leverage the fact that the frequency distribution of natural-language text follows a power law. Given a document collection, a set of indexes is proposed for materialization so that the time for intersecting keywords does not exceed a given threshold Δ. When considering the associated space requirement, the additional indexes are limited. Materialization of such a set of indexes for reasonable values of Δ (e.g., the time required to scan 20% of the largest inverted index), at least for a collection of short documents is distributed by the power law.

Abstract translation: 索引结构和查询处理框架，其对计算关键词查询的开销执行给定的阈值。这包括关键字处理算法，确定要实现哪些索引的逻辑，以及减少用于确定构建哪些索引的开销的概率方法。指数结构利用了自然语言文本的频率分布遵循幂律的事实。给定文档集合，提出了一组索引用于实现，以便关键字相交的时间不超过给定的阈值Delta。在考虑相关空间需求时，附加指标有限。对于合理的Delta值（例如，扫描20％的最大倒排指数所需的时间），至少对于短文件的收集，这种一组索引的实现是通过权力法分配的。

53.

发明授权
Sampling for queries 有权
Title translation: 查询抽样

公开(公告)号：US07493316B2

公开(公告)日：2009-02-17

申请号：US11296036

申请日：2005-12-07

Applicant: Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

Inventor： Surajit Chaudhuri , Vivek R. Narasayya , Rajeev Motwani , Mayur D. Datar

IPC: G06F17/30

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

Abstract translation: 一种估计数据库查询结果的方法，通过基于在执行工作负载中所需的元组的使用概率对数据库中的加权元组进行抽样来估计结果。每个元组采样的概率相关。根据每个采样元组中的值计算聚合，同时乘以与每个元组采样相关联的概率的反转。

54.

发明授权
Dynamic physical database design 有权
Title translation: 动态物理数据库设计

公开(公告)号：US07483918B2

公开(公告)日：2009-01-27

申请号：US10914901

申请日：2004-08-10

Applicant: Surajit Chaudhuri , Arnd Christian Konig , Vivek R. Narasayya

Inventor： Surajit Chaudhuri , Arnd Christian Konig , Vivek R. Narasayya

IPC: G06F7/00

CPC classification number: G06F17/30312 , Y10S707/99933 , Y10S707/99945 , Y10S707/99948

Abstract: A monitoring component of a database server collects a subset of a query workload along with related statistics. A remote index tuning component uses the workload subset and related statistics to determine a physical design that minimizes the cost of executing queries in the workload subset while ensuring that queries omitted from the subset do not degrade in performance.

Abstract translation: 数据库服务器的监视组件收集查询工作负载的一部分以及相关统计信息。远程索引调整组件使用工作负载子集和相关统计信息来确定最小化在工作负载子集中执行查询的成本的物理设计，同时确保从子集中省略的查询不会降低性能。

55.

发明授权
Integrating horizontal partitioning into physical database design 有权
Title translation: 将水平分区整合到物理数据库设计中

公开(公告)号：US07472107B2

公开(公告)日：2008-12-30

申请号：US10601416

申请日：2003-06-23

Applicant: Sanjay Agrawal , Surajit Chaudhuri , Vivek Narasayya

Inventor： Sanjay Agrawal , Surajit Chaudhuri , Vivek Narasayya

IPC: G06F17/30

CPC classification number: G06F17/30312 , Y10S707/99932 , Y10S707/99933 , Y10S707/99945

Abstract: Integrating the partitioning of physical design structures with the physical design process can result in more efficient query execution. When candidate structures are evaluated for their relative benefit, one or more partitioning methods is associated with each structure so that the benefits of various partitioning methods are taken into consideration when the structures are selected for use by the database. A pool of partitioned candidate structures is formed by proposing and evaluating the benefit of candidate structures with associated partitioning on a per query basis. The selected partitioned candidates are then used to construct generalized structures with associated partitioning methods that are evaluated for their benefit over the workload. Those generalized structures are added to the pool of partitioned candidate structures. From this augmented pool of partitioned candidate structures, an optimal set of partitioned structures is enumerated for use by the database system.

Abstract translation: 将物理设计结构的分区与物理设计过程集成可以实现更有效的查询执行。当评估候选结构的相对效益时，一个或多个分区方法与每个结构相关联，以便在选择结构以供数据库使用时考虑各种分区方法的优点。通过在每个查询的基础上提出并评估具有关联划分的候选结构的优点来形成分区候选结构池。然后，所选择的分区候选者用于构建具有相关分区方法的通用结构，该方法被评估为其对工作负载的好处。那些广义结构被添加到分区候选结构的池中。从这个扩展的分区候选结构池中，列举了一组最佳的分区结构，供数据库系统使用。

56.

发明申请
LIGHTWEIGHT PHYSICAL DESIGN ALERTER 有权
Title translation: 轻型物理设计报警器

公开(公告)号：US20080183644A1

公开(公告)日：2008-07-31

申请号：US11669782

申请日：2007-01-31

Applicant: Nicolas Bruno , Surajit Chaudhuri

Inventor： Nicolas Bruno , Surajit Chaudhuri

IPC: G06F15/18 , G06F12/00 , G06F13/00

CPC classification number: G06F17/30306

Abstract: A lightweight physical design alerter can analyze a workload and determine whether a comprehensive tuning session would result in a configuration improvement over the current configuration. The alerter provides a low-overhead procedure that can run during normal operation of a database management system and produce a notification if a current configuration is less than optimal. The alerter can report lower and upper bounds on the improvements that could be obtained if a comprehensive tuning tool is launched. A lower bound can be justified by generating feasible configurations. The disclosed embodiments can be extended to query updates, materialized views, and other physical design features (e.g., partitioning).

Abstract translation: 轻量级物理设计报警器可以分析工作负载并确定综合调优会话是否会导致配置改进超过当前配置。报警器提供了一个低开销的过程，可以在数据库管理系统的正常操作期间运行，并在当前配置不太适合的情况下产生通知。报警器可以报告如果启动综合调整工具可以获得的改进的上下限。可以通过生成可行的配置来证明下限。所公开的实施例可以扩展到查询更新，物化视图和其他物理设计特征（例如，分区）。

57.

发明授权
Method and apparatus for exploiting statistics on query expressions for optimization 有权
Title translation: 利用查询表达式进行统计优化的方法和装置

公开(公告)号：US07363289B2

公开(公告)日：2008-04-22

申请号：US11177598

申请日：2005-07-07

Applicant: Surajit Chaudhuri , Nicolas Bruno

Inventor： Surajit Chaudhuri , Nicolas Bruno

IPC: G06F17/30

CPC classification number: G06F17/30463 , G06F17/30536 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942 , Y10S707/99943 , Y10S707/99944 , Y10S707/99945

Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query. If additional intermediate statistics are necessary, a pool of intermediate statistics may be generated based on the queries in the workload by evaluating the benefit of a given statistic over the workload and adding intermediate statistics to the pool that provide relatively great benefit.

Abstract translation: 一种用于评估具有存储在其中的记录的关系数据库的用户查询的方法，由在数据库上执行的一组查询组成的工作负载以及生成用户查询的查询执行计划的查询优化器。每个查询计划包括多个中间查询计划组件，其从数据库会议查询条件验证记录的子集。该方法访问查询计划和一组存储的中间统计信息，用于查询组件验证的记录，例如总结验证查询组件的记录的基数的直方图。该方法基于所选择的中间统计（可能通过重写查询计划）形成转换的查询计划，并且估计转换后的查询计划的基数以得到查询的更准确的基数估计。如果需要额外的中间统计数据，则可以根据工作负载中的查询生成中间统计数据池，方法是评估给定统计量对工作负载的好处，并将中间统计信息添加到提供相对较大收益的池中。

58.

发明授权
Automated layout of relational databases 有权
Title translation: 关系数据库的自动布局

公开(公告)号：US07249141B2

公开(公告)日：2007-07-24

申请号：US10426235

申请日：2003-04-30

Applicant: Sanjay Agrawal , Surajit Chaudhuri , Abhinandan Das , Vivek Narasayya

Inventor： Sanjay Agrawal , Surajit Chaudhuri , Abhinandan Das , Vivek Narasayya

IPC: G06F17/30 , G06F7/00

CPC classification number: G06F17/30595 , Y10S707/99932 , Y10S707/99933 , Y10S707/99943

Abstract: Layout in a database system is performed using workload information. Execution information for a workload is obtained. Cumulative access and co-access information for database objects is then assembled. A cost model is developed for quantitatively capturing the value of different layouts, and a search is performed for a recommended database layout. In one embodiment, a greedy search is performed which initially attempts provide a layout that minimizes co-location of objects on storage objects, and then attempts to improve that layout via a greedy search.

Abstract translation: 使用工作负载信息执行数据库系统中的布局。获取工作负载的执行信息。然后组合数据库对象的累积访问和共存信息。开发了一种成本模型，用于定量捕获不同布局的值，并为推荐的数据库布局执行搜索。在一个实施例中，执行贪婪搜索，其最初尝试提供使存储对象上的对象的共同定位最小化的布局，然后尝试通过贪婪搜索来改进该布局。

59.

发明申请
Sampling for queries 有权

公开(公告)号：US20060085463A1

公开(公告)日：2006-04-20

申请号：US11296034

申请日：2005-12-07

Applicant: Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

Inventor： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

IPC: G06F7/00

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

60.

发明申请
Database aggregation query result estimator 有权

公开(公告)号：US20060036600A1

公开(公告)日：2006-02-16

申请号：US11246355

申请日：2005-10-07

Applicant: Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

Inventor： Surajit Chaudhuri , Vivek Narasayya , Rajeev Motwani , Mayur Datar

IPC: G06F7/00

CPC classification number: G06F17/30489 , G06F17/30536 , G06F2216/03 , Y10S707/957 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99943

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification