Techniques for maintaining statistics in a database system

    公开(公告)号:US11468073B2

    公开(公告)日:2022-10-11

    申请号:US16533614

    申请日:2019-08-06

    Abstract: Techniques are provided for gathering statistics in a database system. The techniques involve gathering some statistics using an “on-the-fly” technique, some statistics through a “high-frequency” technique, and yet other statistics using a “prediction” technique. The technique used to gather each statistic is based, at least in part, on the overhead required to gather the statistic. For example, low-overhead statistics may be gathered “on-the-fly” using the same process that is performing the operation that affects the statistic, while statistics whose gathering incurs greater overhead may be gathered in the background, while the database is live, using the high-frequency technique. The prediction technique may be used for relatively-high overhead statistics that can be predicted based on historical data and the current value of predictor statistics.

    TECHNIQUES FOR MAINTAINING STATISTICS IN A DATABASE SYSTEM

    公开(公告)号:US20200042522A1

    公开(公告)日:2020-02-06

    申请号:US16533614

    申请日:2019-08-06

    Abstract: Techniques are provided for gathering statistics in a database system. The techniques involve gathering some statistics using an “on-the-fly” technique, some statistics through a “high-frequency” technique, and yet other statistics using a “prediction” technique. The technique used to gather each statistic is based, at least in part, on the overhead required to gather the statistic. For example, low-overhead statistics may be gathered “on-the-fly” using the same process that is performing the operation that affects the statistic, while statistics whose gathering incurs greater overhead may be gathered in the background, while the database is live, using the high-frequency technique. The prediction technique may be used for relatively-high overhead statistics that can be predicted based on historical data and the current value of predictor statistics.

    Statistics based query transformation

    公开(公告)号:US11561973B2

    公开(公告)日:2023-01-24

    申请号:US16147521

    申请日:2018-09-28

    Abstract: Techniques are described for responding to aggregate queries using optimizer statistics already available in the data dictionary of the database in which the database object targeting by the aggregate query resides, without the user creating any additional objects (e.g. materialized views) and without requiring the objects to be loaded into volatile memory in a columnar fashion. The user query is rewritten to produce a transformed query that targets the dictionary tables to form the aggregate result without scanning the user tables. “Accuracy indicators” may be maintained to indicate whether those statistics are accurate. Only accurate statistics are used to answer queries that require accurate answers. The accuracy check can be made during runtime, allowing the query plan of the transformed query to be used regardless of the accuracy of the statistics. For queries that request approximations, inaccurate statistics may be used so long as the statistics are “accurate enough”.

    Leveraging query-specific statistics in non-equivalent queries

    公开(公告)号:US11321317B2

    公开(公告)日:2022-05-03

    申请号:US15666380

    申请日:2017-08-01

    Abstract: Techniques for processing queries are provided. In one approach, an execution plan for a query includes multiple sub-plans, one or more of which are selected at runtime while one or more other sub-plans are not executed during execution of the execution plan. In another approach, data about misestimate is generated and stored persistently for subsequent queries. In another approach, statistics for a database object are generated automatically and efficiently while the database object is created or data items are added thereto. In another approach, a hybrid histogram is created that includes a feature of frequency histograms and a feature of height-balanced histograms. In another approach, computer jobs are executed in such a way to avoid deadlock. In another approach, changes to a database object trigger a hard parse of a query even though an execution plan already exists for the query.

    APPROXIMATE DISTINCT COUNTING IN A BOUNDED MEMORY
    5.
    发明申请
    APPROXIMATE DISTINCT COUNTING IN A BOUNDED MEMORY 有权
    在边界存储器中大致区分计数

    公开(公告)号:US20170024387A1

    公开(公告)日:2017-01-26

    申请号:US14818663

    申请日:2015-08-05

    CPC classification number: G06F17/30489

    Abstract: A table is processed to determine an approximate NDV for a plurality of groups. For each row, a group based is identified based on one or more group-by columns. A hashed valued is generated by applying a uniform hash function to a value in an NDV column. The hashed value is assigned to a particular bucket based on the values at a first set of bit positions in a binary representation of the hashed value. A bit position value is determined based on for a remaining portion of the binary representation of the hashed value. The bit position value is based on a number of ordered bits in the hashed value that match a particular bit pattern. For each group identified, a maximum bit position (MBP) table is generated. The MBP table stores, for one or more buckets, the maximum bit position value determined for hashed values assigned to a particular bucket.

    Abstract translation: 处理表以确定多个组的近似NDV。 对于每一行,基于一个或多个分组列标识基于组。 通过将均匀散列函数应用于NDV列中的值来生成散列值。 基于哈希值的二进制表示中的第一组位位置处的值,将散列值分配给特定存储桶。 基于散列值的二进制表示的剩余部分来确定位位置值。 位位置值基于与特定位模式匹配的散列值中的有序位数。 对于识别的每个组,生成最大位位置(MBP)表。 MBP表为一个或多个桶存储为分配给特定桶的散列值确定的最大位位置值。

    TRIGGERING HARD PARSES
    6.
    发明申请
    TRIGGERING HARD PARSES 有权
    触发硬件

    公开(公告)号:US20140095475A1

    公开(公告)日:2014-04-03

    申请号:US14041952

    申请日:2013-09-30

    CPC classification number: G06F17/30466 G06F17/30463

    Abstract: Techniques for processing queries are provided. In one approach, an execution plan for a query includes multiple sub-plans, one or more of which are selected at runtime while one or more other sub-plans are not executed during execution of the execution plan. In another approach, data about misestimate is generated and stored persistently for subsequent queries. In another approach, statistics for a database object are generated automatically and efficiently while the database object is created or data items are added thereto. In another approach, a hybrid histogram is created that includes a feature of frequency histograms and a feature of height-balanced histograms. In another approach, computer jobs are executed in such a way to avoid deadlock. In another approach, changes to a database object trigger a hard parse of a query even though an execution plan already exists for the query.

    Abstract translation: 提供了处理查询的技术。 在一种方法中,用于查询的执行计划包括多个子计划,其中一个或多个在运行时被选择,而在执行计划的执行期间不执行一个或多个其他子计划。 在另一种方法中,生成有关误会的数据并持续存储以用于后续查询。 在另一种方法中,数据库对象的统计信息是在创建数据库对象或将数据项添加到其中时自动高效生成的。 在另一种方法中,创建包括频率直方图的特征和高度平衡直方图的特征的混合直方图。 在另一种方法中,以这样的方式执行计算机作业以避免死锁。 在另一种方法中,即使执行计划已经存在于查询中,对数据库对象的更改也会触发查询的硬解析。

    Techniques for maintaining statistics in a database system

    公开(公告)号:US12147440B2

    公开(公告)日:2024-11-19

    申请号:US17483326

    申请日:2021-09-23

    Abstract: Techniques are provided for gathering statistics in a database system. The techniques involve gathering some statistics using an “on-the-fly” technique, some statistics through a “high-frequency” technique, and yet other statistics using a “prediction” technique. The technique used to gather each statistic is based, at least in part, on the overhead required to gather the statistic. For example, low-overhead statistics may be gathered “on-the-fly” using the same process that is performing the operation that affects the statistic, while statistics whose gathering incurs greater overhead may be gathered in the background, while the database is live, using the high-frequency technique. The prediction technique may be used for relatively-high overhead statistics that can be predicted based on historical data and the current value of predictor statistics.

    HISTOGRAM-AUGMENT DYNAMIC SAMPLING FOR JOIN CARDINALITY ESTIMATION

    公开(公告)号:US20250139092A1

    公开(公告)日:2025-05-01

    申请号:US18384231

    申请日:2023-10-26

    Abstract: A histogram-augmented dynamic sampling approach is provided for determining cardinality of a two-table join. The approach has a pre-processing phase in which data structures are created that will be used during a compilation phase for cardinality estimation. These data structures include a row histogram and a key histogram, which are created for selected columns of a first table. A cardinality estimation phase uses the data structures to estimate the cardinality of various joins at the time of query compilation. In this phase, the system executes queries that join the histograms with a second table, to perform the cardinality estimation.

Patent Agency Ranking