Scaling delta table optimize command

    公开(公告)号:US12079167B1

    公开(公告)日:2024-09-03

    申请号:US18093916

    申请日:2023-01-06

    CPC classification number: G06F16/172 G06F16/2282

    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.

    Scaling delta table optimize command

    公开(公告)号:US11567900B1

    公开(公告)日:2023-01-31

    申请号:US17384486

    申请日:2021-07-23

    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.

    Clustering key selection based on machine-learned key selection models for data processing service

    公开(公告)号:US12229169B1

    公开(公告)日:2025-02-18

    申请号:US18501830

    申请日:2023-11-03

    Abstract: The disclosed configurations provide a method (and/or a computer-readable medium or system) for determining, from a table schema describing keys of a data table, one or more clustering keys that can be used to cluster data files of a data table. The method includes generating features for the data table, generating tokens from the features, generating a prediction for each token by applying to the token a machine-learned transformer model trained to predict a likelihood that the key associated with the token is a clustering key for the data table, determining clustering keys based on the predictions, and clustering data records of the data table into data files based on key-values for the clustering keys.

Patent Agency Ranking