Workload-aware data placement advisor for OLAP database systems

    公开(公告)号:US12229135B2

    公开(公告)日:2025-02-18

    申请号:US17699607

    申请日:2022-03-21

    Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automatic data placement recommendations for partitioning data across multiple nodes in a database system. The system is configured to extract workload-specific features of a database workload running at a database system and dataset-specific features of a database running on the database system. The workload-specific features characterize utilization of the database workload. The dataset-specific features characterize how data is organized within the database. The system identifies a plurality of candidate keys for determining how to partition data stored in the database across nodes. Based at least in part on the workload-specific features, the dataset specific features, and the plurality of candidate keys, a set of candidate key combinations for partitioning data is generated. Using a machine learning model, determine a particular candidate key combination that optimizes query execution performance benefit based on the workload-specific features and the dataset specific features. Generate data placement commands to allocate the database tables across the nodes.

    WORKLOAD-AWARE DATA PLACEMENT ADVISOR FOR OLAP DATABASE SYSTEMS

    公开(公告)号:US20230297573A1

    公开(公告)日:2023-09-21

    申请号:US17699607

    申请日:2022-03-21

    Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automatic data placement recommendations for partitioning data across multiple nodes in a database system. The system is configured to extract workload-specific features of a database workload running at a database system and dataset-specific features of a database running on the database system. The workload-specific features characterize utilization of the database workload. The dataset-specific features characterize how data is organized within the database. The system identifies a plurality of candidate keys for determining how to partition data stored in the database across nodes. Based at least in part on the workload-specific features, the dataset specific features, and the plurality of candidate keys, a set of candidate key combinations for partitioning data is generated. Using a machine learning model, determine a particular candidate key combination that optimizes query execution performance benefit based on the workload-specific features and the dataset specific features. Generate data placement commands to allocate the database tables across the nodes.

Patent Agency Ranking