Clustering key selection based on machine-learned key selection models for data processing service

    公开(公告)号:US12229169B1

    公开(公告)日:2025-02-18

    申请号:US18501830

    申请日:2023-11-03

    Abstract: The disclosed configurations provide a method (and/or a computer-readable medium or system) for determining, from a table schema describing keys of a data table, one or more clustering keys that can be used to cluster data files of a data table. The method includes generating features for the data table, generating tokens from the features, generating a prediction for each token by applying to the token a machine-learned transformer model trained to predict a likelihood that the key associated with the token is a clustering key for the data table, determining clustering keys based on the predictions, and clustering data records of the data table into data files based on key-values for the clustering keys.

Patent Agency Ranking