Database group-by query cardinality estimation

    公开(公告)号:US12045233B2

    公开(公告)日:2024-07-23

    申请号:US17979643

    申请日:2022-11-02

    Applicant: SAP SE

    CPC classification number: G06F16/24537 G06F16/24545

    Abstract: Mechanisms are disclosed for estimating cardinality of group-by queries. A probability of occurrence of values is obtained for columns that satisfy the query occurring in tables from a trained machine learning model. A range selectivity is calculated based on a conditional probability of occurrence of the values. A set of valid generated sample tuples is generated from the trained machine learning model. A group-by selectivity is calculated by keeping the conditional probability of occurrence to obtain probabilities that a result set will have specific group-by column values associated with the tables while proceeding with progressive sampling. A sampling probability is calculated by normalizing the group-by selectivity by dividing the group-by selectivity by the range selectivity. The samples are filtered such that the samples having a sampling probability below a sampling probability threshold are filtered out. A sampling-based estimator is applied to the filtered samples set to estimate the cardinality.

    DATABASE GROUP-BY QUERY CARDINALITY ESTIMATION

    公开(公告)号:US20240143586A1

    公开(公告)日:2024-05-02

    申请号:US17979643

    申请日:2022-11-02

    Applicant: SAP SE

    CPC classification number: G06F16/24537 G06F16/24545

    Abstract: Mechanisms are disclosed for estimating cardinality of group-by queries. A probability of occurrence of values is obtained for columns that satisfy the query occurring in tables from a trained machine learning model. A range selectivity is calculated based on a conditional probability of occurrence of the values. A set of valid generated sample tuples is generated from the trained machine learning model. A group-by selectivity is calculated by keeping the conditional probability of occurrence to obtain probabilities that a result set will have specific group-by column values associated with the tables while proceeding with progressive sampling. A sampling probability is calculated by normalizing the group-by selectivity by dividing the group-by selectivity by the range selectivity. The samples are filtered such that the samples having a sampling probability below a sampling probability threshold are filtered out. A sampling-based estimator is applied to the filtered samples set to estimate the cardinality.

Patent Agency Ranking