Patent search ap:("Arvind Arasu" OR "Surajit Chaudhuri") AND inv:"Surajit Chaudhuri" Page 7

61.

发明申请
Query progress estimation 有权
Title translation: 查询进度估计

公开(公告)号：US20050222965A1

公开(公告)日：2005-10-06

申请号：US10813963

申请日：2004-03-31

Applicant: Surajit Chaudhuri , Vivek Narasayya , Ravishankar Ramamurthy

Inventor： Surajit Chaudhuri , Vivek Narasayya , Ravishankar Ramamurthy

IPC: G06F7/00 , G06F17/30

CPC classification number: G06F17/30522 , G06F17/30306 , Y10S707/99932 , Y10S707/99945 , Y10S707/99948

Abstract: A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.

Abstract translation: 查询进度指示符，向用户提供在数据库上执行的查询的进度的指示。查询进度的指示允许用户决定是否允许查询完成或应该被中止。可用于估计在数据库上执行的查询的进度的一种方法定义了在查询执行过程中执行的工作模型。根据模型估计执行查询期间执行的总工作量。在执行查询期间在给定点执行的工作量根据模型进行估计。查询的进度使用在给定时间点的估计工作量和估计的总工作量来估计。可以向用户提供该估计的查询执行进度。

62.

发明申请
Optimization based method for estimating the results of aggregate queries 失效
Title translation: 用于估计聚合查询结果的基于优化的方法

公开(公告)号：US20050033759A1

公开(公告)日：2005-02-10

申请号：US10935803

申请日：2004-09-08

Applicant: Surajit Chaudhuri , Vivek Narasayya , Gantam Das

Inventor： Surajit Chaudhuri , Vivek Narasayya , Gantam Das

IPC: G06F17/30 , G06F17/00

CPC classification number: G06F17/30536 , G06F17/30489 , Y10S707/99933 , Y10S707/99934 , Y10S707/99936 , Y10S707/99937 , Y10S707/99943 , Y10S707/99945

Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived comprising a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

Abstract translation: 一种用于估计具有以表格排列的数据记录的数据库的查询结果的方法。数据库具有预期的工作负载，其中包括可在数据库上执行的一组查询。导出预期的工作负载，包括可在数据库上执行的一组查询。通过在以下方式选择数据记录来构建样本，以便在预期工作负载中通过查询对数据记录进行操作以最小化估计误差的方式，以提供预期的工作负载以提供预期结果的方式来选择包含在样本中的数据记录。查询访问样本并在样本上执行，返回估计的查询结果。可以通过指定给定工作负载中的查询选择的记录与预期工作负载中的查询所选择的记录之间的重叠程度来构建预期的工作负载。

63.

发明授权
Sampling over joins for database systems 有权
Title translation: 对数据库系统的连接进行抽样

公开(公告)号：US06542886B1

公开(公告)日：2003-04-01

申请号：US09268275

申请日：1999-03-15

Applicant: Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

Inventor： Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

IPC: G06F1730

CPC classification number: G06F17/3061 , G06F17/30498 , G06F17/30536 , G06F2216/03 , Y10S707/99932 , Y10S707/99937

Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

Abstract translation: 数据库服务器根据期望的抽样语义（例如替换（WR），无替换（WoR）或独立硬币翻转（CF））语义支持对记录或元组进行加权和未加权采样。数据库服务器可以顺序地执行这样的采样，以便例如在查询树中通过流水线生成的诸如作为流生成的非物化记录，而且在单次通过中对采样记录（无论是否具体化）进行采样。数据库服务器还支持对两个记录或元组关系的连接进行抽样，而不需要计算完整连接，而不需要在关系的连接属性值上实现关系和/或索引。

64.

发明授权
Identifying indexes on materialized views for database workload 有权
Title translation: 识别数据库工作负载的物化视图的索引

公开(公告)号：US06356891B1

公开(公告)日：2002-03-12

申请号：US09629412

申请日：2000-08-01

Applicant: Sanjay Agrawal , Surajit Chaudhuri , Vivek R. Narasayya

Inventor： Sanjay Agrawal , Surajit Chaudhuri , Vivek R. Narasayya

IPC: G06F1730

CPC classification number: G06F17/30336 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935

Abstract: An index and materialized view selection wizard produces a fast and reasonable recommendation for a configuration of indexes, materialized views, and indexes on materialized views which are beneficial given a specified workload for a given database and database server. Candidate materialized views and indexes are obtained, and a joint enumeration of the combined materialized views and indexes is performed to obtain a recommended configuration. The configuration includes indexes, materialized views and indexes on materialized views. Candidate materialized views are obtained by first determining subsets of tables are referenced in queries in the workload and then finding interesting table subsets. Next, interesting subsets are considered on a per query basis to determine which are syntactically relevant for a query. Materialized views which are likely to be used for the workload are then generated along with a set of merged materialized views. Clustered indexes and non-clustered indexes on materialized views are then generated. The indexes, materialized views and indexes on materialized views are then enumerated together to form the recommended configuration.

Abstract translation: 索引和物化视图选择向导可以为物理视图的索引，物化视图和索引配置提供快速合理的建议，这对给定数据库和数据库服务器的指定工作负载是有益的。获取候选物化视图和索引，并执行组合实例化视图和索引的联合枚举，以获得推荐的配置。配置包括物化视图的索引，物化视图和索引。通过首先确定表中的子集在工作负载中的查询中引用并且然后找到有趣的表子集来获得候选物化视图。接下来，在每个查询的基础上考虑有趣的子集，以确定哪个在查询语法上相关。可能用于工作负载的物化视图随同一组合并物化视图一起生成。然后生成物化视图上的聚簇索引和非聚集索引。然后将物化视图的索引，物化视图和索引列在一起以形成推荐的配置。

65.

发明授权
Histogram construction using adaptive random sampling with cross-validation for database systems 有权
Title translation: 使用自适应随机抽样与数据库系统交叉验证的直方图构造

公开(公告)号：US06278989B1

公开(公告)日：2001-08-21

申请号：US09139835

申请日：1998-08-25

Applicant: Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

Inventor： Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya

IPC: G06F1730

CPC classification number: G06F17/30463 , G06F17/30536 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942

Abstract: Using adaptive random sampling with cross-validation helps determine when enough data of a database has been sampled to construct histograms on one or more columns of one or more tables of the database within a desired or predetermined degree of accuracy. An adaptive random sampling histogram construction tool constructs an approximate equi-height k-histogram using an initial sample of data values from the database and iteratively updates the histogram using an additional sample of data values from the database until the histogram is within the desired degree of accuracy. The accuracy of the histogram is cross-validated against the additional sample at each iteration, and the additional sample is used to update the histogram to help improve its accuracy. The accuracy of the histogram may be measured by an error in distribution of the additional sample over the histogram as compared to a threshold error using a suitable error metric. By attempting to sample only the number of data values necessary to construct the histogram within the desired degree of accuracy, the adaptive random sampling histogram construction tool attempts to avoid any cost increases in time and memory from sampling too many data values.

Abstract translation: 使用具有交叉验证的自适应随机抽样有助于确定在数据库的足够数据被采样以在期望的或预定的准确度内在数据库的一个或多个表的一个或多个列上构造直方图。自适应随机抽样直方图构造工具使用来自数据库的数据值的初始样本构建近似等高k直方图，并使用来自数据库的附加数据值样本迭代地更新直方图，直到直方图在所需的程度准确性。在每次迭代时，直方图的精度与附加样本进行交叉验证，并且附加样本用于更新直方图以帮助提高其准确性。与使用合适的误差度量的阈值误差相比，可以通过直方图上的附加样本的分布误差来测量直方图的精度。通过尝试仅在所需精度范围内仅采样构建直方图所需的数据值的数量，自适应随机抽样直方图构造工具尝试避免在采样太多数据值时的时间和内存中的任何成本增加。

66.

发明授权
Index tuner for given workload 有权
Title translation: 索引调谐器用于给定的工作负载

公开(公告)号：US06266658B1

公开(公告)日：2001-07-24

申请号：US09553070

申请日：2000-04-20

Applicant: Atul Adya , Sanjay Agrawal , Surajit Chaudhuri , Vivek R. Narasayya

Inventor： Atul Adya , Sanjay Agrawal , Surajit Chaudhuri , Vivek R. Narasayya

IPC: G06F1730

CPC classification number: G06F17/30312 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935

Abstract: An index tuning wizard produces a fast and reasonable recommendation identifying database indexes to use given a specified workload. A query optimizer is used to determine the expected usefulness of potential indexes for the specified workload by taking cost of queries in the workload into account. A cost based pruning of indexes is then performed to provide an intermediate set of proposed indexes. Indexes having most benefit based on storage constraints are then selected. The optimizer is then used again, and further pruning is done on a benefits basis. An index is not recommended unless it has a significant impact on the workload.

Abstract translation: 索引调整向导会产生一个快速合理的建议，用于标识在指定工作负载下使用的数据库索引。查询优化器用于通过考虑工作负载中的查询成本来确定指定工作负载的潜在索引的预期有用性。然后执行索引的基于成本的修剪以提供提出的索引的中间集合。然后选择基于存储约束最有利的索引。然后，优化器再次被使用，并且进一步修剪是在有益的基础上进行的。不建议使用索引，除非它对工作负载有重大影响。

67.

发明授权
Database system index selection using cost evaluation of a workload for multiple candidate index configurations 失效

公开(公告)号：US5926813A

公开(公告)日：1999-07-20

申请号：US980829

申请日：1997-12-01

Applicant: Surajit Chaudhuri , Vivek Narasayya

Inventor： Surajit Chaudhuri , Vivek Narasayya

IPC: G06F17/30

CPC classification number: G06F17/30312 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99953

Abstract: An index selection tool helps reduce costs in time and memory in selecting an index configuration or set of indexes for use by a database server in accessing a database in accordance with a workload of queries. The index selection tool attempts to reduce the number of indexes to be considered, the number of index configurations to be enumerated, and the number of invocations of a query optimizer in selecting an index configuration for the workload.

68.

发明授权
Database system multi-column index selection for a workload 失效
Title translation: 一个工作负载的数据库系统多列索引选择

公开(公告)号：US5913206A

公开(公告)日：1999-06-15

申请号：US980831

申请日：1997-12-01

Applicant: Surajit Chaudhuri , Vivek Narasayya

Inventor： Surajit Chaudhuri , Vivek Narasayya

IPC: G06F17/30

CPC classification number: G06F17/30312 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99935 , Y10S707/99942 , Y10S707/99953

Abstract: An index selection tool helps reduce costs in time and memory in selecting an index configuration or set of indexes for use by a database server in accessing a database in accordance with a workload of queries. The index selection tool attempts to reduce the number of indexes to be considered, the number of index configurations to be enumerated, and the number of invocations of a query optimizer in selecting an index configuration for the workload.

Abstract translation: 索引选择工具有助于在选择索引配置或索引集时，在数据库服务器根据查询的工作量访问数据库时，减少时间和内存中的成本。索引选择工具尝试减少要考虑的索引数量，要枚举的索引配置数量以及查询优化器在为工作负载选择索引配置时调用的次数。

69.

发明授权
Entity augmentation service from latent relational data 有权
Title translation: 潜在关系数据的实体增强服务

公开(公告)号：US09171081B2

公开(公告)日：2015-10-27

申请号：US13413179

申请日：2012-03-06

Applicant: Kris K. Ganjam , Kaushik Chakrabarti , Mohamed A. Yakout , Surajit Chaudhuri

Inventor： Kris K. Ganjam , Kaushik Chakrabarti , Mohamed A. Yakout , Surajit Chaudhuri

IPC: G06F17/30 , G06F7/00

CPC classification number: G06F17/30864 , G06F17/30539 , G06F2216/03

Abstract: The subject disclosure is directed towards providing data for augmenting an entity-attribute-related task. Pre-processing is preformed on entity-attribute tables extracted from the web, e.g., to provide indexes that are accessible to find data that completes augmentation tasks. The indexes are based on both direct mappings and indirect mappings between tables. Example augmentation tasks include queries for augmented data based on an attribute name or examples, or finding synonyms for augmentation. An online query is efficiently processed by accessing the indexes to return augmented data related to the task.

Abstract translation: 主题公开旨在提供用于增强实体属性相关任务的数据。在从网络提取的实体属性表上执行预处理，例如，提供可访问以查找完成扩充任务的数据的索引。索引基于表之间的直接映射和间接映射。示例增强任务包括基于属性名称或示例的增强数据查询，或查找用于扩充的同义词。通过访问索引以返回与任务相关的扩充数据，可以有效地处理在线查询。

70.

发明授权
Finding data in connected corpuses using examples 有权
Title translation: 使用示例查找连接的语料库中的数据

公开(公告)号：US08983954B2

公开(公告)日：2015-03-17

申请号：US13443681

申请日：2012-04-10

Applicant: John C. Platt , Surajit Chaudhuri , Lev Novik , Henricus Johannes Maria Meijer , Efim Hudis , Kunal Mukerjee , Christopher Alan Hays

Inventor： John C. Platt , Surajit Chaudhuri , Lev Novik , Henricus Johannes Maria Meijer , Efim Hudis , Kunal Mukerjee , Christopher Alan Hays

IPC: G06F17/30

CPC classification number: G06F17/30758 , G06F17/30303 , G06F17/30395 , G06F17/3053 , G06F17/30539 , G06F17/30595 , G06F17/30722 , G06F17/30867

Abstract: In one embodiment, datasets are stored in a catalog. The datasets are enriched by establishing relationships among the domains in different datasets. A user searches for relevant datasets by providing examples of the domains of interest. The system identifies datasets corresponding to the user-provided examples. The system them identifies connected subsets of the datasets that are directly linked or indirectly linked through other domains. The user provides known relationship examples to filter the connected subsets and to identify the connected subsets that are most relevant to the user's query. The selected connected subsets may be further analyzed by business intelligence/analytics to create pivot tables or to process the data.

Abstract translation: 在一个实施例中，数据集存储在目录中。通过在不同数据集中建立域之间的关系来丰富数据集。用户通过提供感兴趣的域的示例来搜索相关的数据集。系统识别与用户提供的示例对应的数据集。系统识别通过其他域直接链接或间接链接的数据集的连接子集。用户提供已知的关系示例来过滤连接的子集并识别与用户查询最相关的连接的子集。可以通过商业智能/分析进一步分析所选择的连接子集以创建枢轴表或处理数据。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification