EFFICIENT MERGE OF TABULAR DATA USING MIXING
    11.
    发明公开

    公开(公告)号:US20240070155A1

    公开(公告)日:2024-02-29

    申请号:US17895882

    申请日:2022-08-25

    CPC classification number: G06F16/2456 G06F16/2282

    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and obtaining other resulting files based at least in part on a second set of unmatched rows among the target table and the source table that results from the first set of unmatched rows having been processed in the second job, and obtaining a resulting table based on (i) second job resulting file(s), and (ii) other resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a first matching action based on matched rows and a second matching action based on a subset of unmatched rows.

    EFFICIENT MERGE OF TABULAR DATA USING A PROCESSING FILTER

    公开(公告)号:US20240069863A1

    公开(公告)日:2024-02-29

    申请号:US17895872

    申请日:2022-08-25

    CPC classification number: G06F7/14 G06F16/148 G06F16/16

    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first, second and a third jobs, and obtaining a resulting table based at least in part on the second job resulting file(s) and third job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s). Performing the third job includes determining unmatched rows for target table files and storing the unmatched rows in third job resulting file(s).

    K-D TREE BALANCED SPLITTING
    13.
    发明公开

    公开(公告)号:US20230359602A1

    公开(公告)日:2023-11-09

    申请号:US17738609

    申请日:2022-05-06

    CPC classification number: G06F16/2246

    Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

    CONCURRENT OPTIMISTIC TRANSACTIONS FOR TABLES WITH DELETION VECTORS

    公开(公告)号:US20250103580A1

    公开(公告)日:2025-03-27

    申请号:US18928982

    申请日:2024-10-28

    Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

    Fetching Query Results Through Cloud Object Stores

    公开(公告)号:US20240394271A1

    公开(公告)日:2024-11-28

    申请号:US18614380

    申请日:2024-03-22

    Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

    Data ingestion using data file clustering with KD-epsilon trees

    公开(公告)号:US12072863B1

    公开(公告)日:2024-08-27

    申请号:US18218400

    申请日:2023-07-05

    CPC classification number: G06F16/2246 G06F16/2358 G06F16/245 G06F16/285

    Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

    Fetching query results through cloud object stores

    公开(公告)号:US11960494B1

    公开(公告)日:2024-04-16

    申请号:US17841946

    申请日:2022-06-16

    CPC classification number: G06F16/2471 G06F11/3419 G06F16/244 G06F16/256

    Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

Patent Agency Ranking