Query progress estimation
    82.
    发明申请
    Query progress estimation 有权
    查询进度估计

    公开(公告)号:US20050222965A1

    公开(公告)日:2005-10-06

    申请号:US10813963

    申请日:2004-03-31

    IPC分类号: G06F7/00 G06F17/30

    摘要: A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.

    摘要翻译: 查询进度指示符,向用户提供在数据库上执行的查询的进度的指示。 查询进度的指示允许用户决定是否允许查询完成或应该被中止。 可用于估计在数据库上执行的查询的进度的一种方法定义了在查询执行过程中执行的工作模型。 根据模型估计执行查询期间执行的总工作量。 在执行查询期间在给定点执行的工作量根据模型进行估计。 查询的进度使用在给定时间点的估计工作量和估计的总工作量来估计。 可以向用户提供该估计的查询执行进度。

    Optimization based method for estimating the results of aggregate queries
    83.
    发明申请
    Optimization based method for estimating the results of aggregate queries 失效
    用于估计聚合查询结果的基于优化的方法

    公开(公告)号:US20050033759A1

    公开(公告)日:2005-02-10

    申请号:US10935803

    申请日:2004-09-08

    IPC分类号: G06F17/30 G06F17/00

    摘要: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived comprising a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

    摘要翻译: 一种用于估计具有以表格排列的数据记录的数据库的查询结果的方法。 数据库具有预期的工作负载,其中包括可在数据库上执行的一组查询。 导出预期的工作负载,包括可在数据库上执行的一组查询。 通过在以下方式选择数据记录来构建样本,以便在预期工作负载中通过查询对数据记录进行操作以最小化估计误差的方式,以提供预期的工作负载以提供预期结果的方式来选择包含在样本中的数据记录。 查询访问样本并在样本上执行,返回估计的查询结果。 可以通过指定给定工作负载中的查询选择的记录与预期工作负载中的查询所选择的记录之间的重叠程度来构建预期的工作负载。

    Sampling over joins for database systems
    84.
    发明授权
    Sampling over joins for database systems 有权
    对数据库系统的连接进行抽样

    公开(公告)号:US06542886B1

    公开(公告)日:2003-04-01

    申请号:US09268275

    申请日:1999-03-15

    IPC分类号: G06F1730

    摘要: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

    摘要翻译: 数据库服务器根据期望的抽样语义(例如替换(WR),无替换(WoR)或独立硬币翻转(CF))语义支持对记录或元组进行加权和未加权采样。 数据库服务器可以顺序地执行这样的采样,以便例如在查询树中通过流水线生成的诸如作为流生成的非物化记录,而且在单次通过中对采样记录(无论是否具体化)进行采样。 数据库服务器还支持对两个记录或元组关系的连接进行抽样,而不需要计算完整连接,而不需要在关系的连接属性值上实现关系和/或索引。

    Identifying indexes on materialized views for database workload
    85.
    发明授权
    Identifying indexes on materialized views for database workload 有权
    识别数据库工作负载的物化视图的索引

    公开(公告)号:US06356891B1

    公开(公告)日:2002-03-12

    申请号:US09629412

    申请日:2000-08-01

    IPC分类号: G06F1730

    摘要: An index and materialized view selection wizard produces a fast and reasonable recommendation for a configuration of indexes, materialized views, and indexes on materialized views which are beneficial given a specified workload for a given database and database server. Candidate materialized views and indexes are obtained, and a joint enumeration of the combined materialized views and indexes is performed to obtain a recommended configuration. The configuration includes indexes, materialized views and indexes on materialized views. Candidate materialized views are obtained by first determining subsets of tables are referenced in queries in the workload and then finding interesting table subsets. Next, interesting subsets are considered on a per query basis to determine which are syntactically relevant for a query. Materialized views which are likely to be used for the workload are then generated along with a set of merged materialized views. Clustered indexes and non-clustered indexes on materialized views are then generated. The indexes, materialized views and indexes on materialized views are then enumerated together to form the recommended configuration.

    摘要翻译: 索引和物化视图选择向导可以为物理视图的索引,物化视图和索引配置提供快速合理的建议,这对给定数据库和数据库服务器的指定工作负载是有益的。 获取候选物化视图和索引,并执行组合实例化视图和索引的联合枚举,以获得推荐的配置。 配置包括物化视图的索引,物化视图和索引。 通过首先确定表中的子集在工作负载中的查询中引用并且然后找到有趣的表子集来获得候选物化视图。 接下来,在每个查询的基础上考虑有趣的子集,以确定哪个在查询语法上相关。 可能用于工作负载的物化视图随同一组合并物化视图一起生成。 然后生成物化视图上的聚簇索引和非聚集索引。 然后将物化视图的索引,物化视图和索引列在一起以形成推荐的配置。

    Histogram construction using adaptive random sampling with cross-validation for database systems
    86.
    发明授权
    Histogram construction using adaptive random sampling with cross-validation for database systems 有权
    使用自适应随机抽样与数据库系统交叉验证的直方图构造

    公开(公告)号:US06278989B1

    公开(公告)日:2001-08-21

    申请号:US09139835

    申请日:1998-08-25

    IPC分类号: G06F1730

    摘要: Using adaptive random sampling with cross-validation helps determine when enough data of a database has been sampled to construct histograms on one or more columns of one or more tables of the database within a desired or predetermined degree of accuracy. An adaptive random sampling histogram construction tool constructs an approximate equi-height k-histogram using an initial sample of data values from the database and iteratively updates the histogram using an additional sample of data values from the database until the histogram is within the desired degree of accuracy. The accuracy of the histogram is cross-validated against the additional sample at each iteration, and the additional sample is used to update the histogram to help improve its accuracy. The accuracy of the histogram may be measured by an error in distribution of the additional sample over the histogram as compared to a threshold error using a suitable error metric. By attempting to sample only the number of data values necessary to construct the histogram within the desired degree of accuracy, the adaptive random sampling histogram construction tool attempts to avoid any cost increases in time and memory from sampling too many data values.

    摘要翻译: 使用具有交叉验证的自适应随机抽样有助于确定在数据库的足够数据被采样以在期望的或预定的准确度内在数据库的一个或多个表的一个或多个列上构造直方图。 自适应随机抽样直方图构造工具使用来自数据库的数据值的初始样本构建近似等高k直方图,并使用来自数据库的附加数据值样本迭代地更新直方图,直到直方图在所需的程度 准确性。 在每次迭代时,直方图的精度与附加样本进行交叉验证,并且附加样本用于更新直方图以帮助提高其准确性。 与使用合适的误差度量的阈值误差相比,可以通过直方图上的附加样本的分布误差来测量直方图的精度。 通过尝试仅在所需精度范围内仅采样构建直方图所需的数据值的数量,自适应随机抽样直方图构造工具尝试避免在采样太多数据值时的时间和内存中的任何成本增加。

    Index tuner for given workload
    87.
    发明授权
    Index tuner for given workload 有权
    索引调谐器用于给定的工作负载

    公开(公告)号:US06266658B1

    公开(公告)日:2001-07-24

    申请号:US09553070

    申请日:2000-04-20

    IPC分类号: G06F1730

    摘要: An index tuning wizard produces a fast and reasonable recommendation identifying database indexes to use given a specified workload. A query optimizer is used to determine the expected usefulness of potential indexes for the specified workload by taking cost of queries in the workload into account. A cost based pruning of indexes is then performed to provide an intermediate set of proposed indexes. Indexes having most benefit based on storage constraints are then selected. The optimizer is then used again, and further pruning is done on a benefits basis. An index is not recommended unless it has a significant impact on the workload.

    摘要翻译: 索引调整向导会产生一个快速合理的建议,用于标识在指定工作负载下使用的数据库索引。 查询优化器用于通过考虑工作负载中的查询成本来确定指定工作负载的潜在索引的预期有用性。 然后执行索引的基于成本的修剪以提供提出的索引的中间集合。 然后选择基于存储约束最有利的索引。 然后,优化器再次被使用,并且进一步修剪是在有益的基础上进行的。 不建议使用索引,除非它对工作负载有重大影响。

    Database system multi-column index selection for a workload
    89.
    发明授权
    Database system multi-column index selection for a workload 失效
    一个工作负载的数据库系统多列索引选择

    公开(公告)号:US5913206A

    公开(公告)日:1999-06-15

    申请号:US980831

    申请日:1997-12-01

    IPC分类号: G06F17/30

    摘要: An index selection tool helps reduce costs in time and memory in selecting an index configuration or set of indexes for use by a database server in accessing a database in accordance with a workload of queries. The index selection tool attempts to reduce the number of indexes to be considered, the number of index configurations to be enumerated, and the number of invocations of a query optimizer in selecting an index configuration for the workload.

    摘要翻译: 索引选择工具有助于在选择索引配置或索引集时,在数据库服务器根据查询的工作量访问数据库时,减少时间和内存中的成本。 索引选择工具尝试减少要考虑的索引数量,要枚举的索引配置数量以及查询优化器在为工作负载选择索引配置时调用的次数。

    Integrated fuzzy joins in database management systems
    90.
    发明授权
    Integrated fuzzy joins in database management systems 有权
    在数据库管理系统中集成模糊连接

    公开(公告)号:US09317544B2

    公开(公告)日:2016-04-19

    申请号:US13253315

    申请日:2011-10-05

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30303 G06F17/30533

    摘要: A fuzzy joins system that is integrated in a database system generates fuzzy joins between records from two datasets. The fuzzy joins system includes a tokenizer to generate tokens for data records and a transformer to find transforms for the tokens. The fuzzy joins system invokes a signature generator, running within a runtime layer of the database system, to generate signatures for data records based on the tokens and their transforms. Subsequently, an equi-join operation joins the records from the two datasets with at least one equal signature. A similarity calculator, running within a runtime layer of the database system, computes a similarity measure using the token information of the joined records. If the similarity measure for any two records is above a threshold, the fuzzy joins system generates a fuzzy join between such two records.

    摘要翻译: 集成在数据库系统中的模糊连接系统在两个数据集的记录之间生成模糊连接。 模糊连接系统包括一个用于生成数据记录令牌的标记器和一个用于为令牌找到变换的变压器。 模糊连接系统调用在数据库系统的运行时层内运行的签名生成器,以基于令牌及其变换生成用于数据记录的签名。 随后,等连接操作将来自两个数据集的记录与至少一个相等的签名相连。 在数据库系统的运行时层内运行的相似度计算器使用所连接的记录的令牌信息来计算相似性度量。 如果任何两个记录的相似性度量高于阈值,则模糊连接系统在这两个记录之间生成模糊连接。