EFFICIENT MERGE OF TABULAR DATA WITH DELETION INDICATIONS

    公开(公告)号:US20240070138A1

    公开(公告)日:2024-02-29

    申请号:US17895890

    申请日:2022-08-25

    CPC classification number: G06F16/2282 G06F9/4881

    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).

    UPDATE AND QUERY OF A LARGE COLLECTION OF FILES THAT REPRESENT A SINGLE DATASET STORED ON A BLOB STORE

    公开(公告)号:US20230394029A1

    公开(公告)日:2023-12-07

    申请号:US18236516

    申请日:2023-08-22

    CPC classification number: G06F16/2358 G06F16/148 G06F16/2282

    Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

    Dataflow graph processing
    55.
    发明授权

    公开(公告)号:US11567998B2

    公开(公告)日:2023-01-31

    申请号:US17362450

    申请日:2021-06-29

    Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.

    Scaling delta table optimize command

    公开(公告)号:US11567900B1

    公开(公告)日:2023-01-31

    申请号:US17384486

    申请日:2021-07-23

    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.

    UPDATE AND QUERY OF A LARGE COLLECTION OF FILES THAT REPRESENT A SINGLE DATASET STORED ON A BLOB STORE

    公开(公告)号:US20220253424A1

    公开(公告)日:2022-08-11

    申请号:US17695411

    申请日:2022-03-15

    Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

    Update and query of a large collection of files that represent a single dataset stored on a blob store

    公开(公告)号:US11308071B2

    公开(公告)日:2022-04-19

    申请号:US16941227

    申请日:2020-07-28

    Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

    MODEL ML REGISTRY AND MODEL SERVING

    公开(公告)号:US20220092043A1

    公开(公告)日:2022-03-24

    申请号:US17324907

    申请日:2021-05-19

    Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.

    QUERY WATCHDOG
    60.
    发明申请

    公开(公告)号:US20220083410A1

    公开(公告)日:2022-03-17

    申请号:US17537124

    申请日:2021-11-29

    Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.

Patent Agency Ranking