-
公开(公告)号:US12072863B1
公开(公告)日:2024-08-27
申请号:US18218400
申请日:2023-07-05
Applicant: Databricks, Inc.
Inventor: Prakhar Jain , Frederick Ryan Johnson , Bart Samwel
IPC: G06F16/20 , G06F16/22 , G06F16/23 , G06F16/245 , G06F16/28
CPC classification number: G06F16/2246 , G06F16/2358 , G06F16/245 , G06F16/285
Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.
-
公开(公告)号:US12056126B2
公开(公告)日:2024-08-06
申请号:US17895877
申请日:2022-08-25
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain
IPC: G06F17/30 , G06F11/34 , G06F16/22 , G06F16/2453 , G06F16/28
CPC classification number: G06F16/24544 , G06F11/3409 , G06F16/2282 , G06F16/285
Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
-
公开(公告)号:US12032573B2
公开(公告)日:2024-07-09
申请号:US17976361
申请日:2022-10-28
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Tathagata Das , Shi Xin , Matei Zaharia
IPC: G06F16/2453 , G06F16/2455
CPC classification number: G06F16/24542 , G06F16/24568
Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
-
公开(公告)号:US11960494B1
公开(公告)日:2024-04-16
申请号:US17841946
申请日:2022-06-16
Applicant: Databricks, Inc.
Inventor: Bogdan Ionut Ghit , Juliusz Sompolski , Shi Xin , Bart Samwel
IPC: G06F16/2458 , G06F11/34 , G06F16/242 , G06F16/25
CPC classification number: G06F16/2471 , G06F11/3419 , G06F16/244 , G06F16/256
Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.
-
公开(公告)号:US20240070153A1
公开(公告)日:2024-02-29
申请号:US17895877
申请日:2022-08-25
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain
IPC: G06F16/2453 , G06F11/34 , G06F16/22 , G06F16/28
CPC classification number: G06F16/24544 , G06F11/3409 , G06F16/2282 , G06F16/285
Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
-
公开(公告)号:US11874832B2
公开(公告)日:2024-01-16
申请号:US18158258
申请日:2023-01-23
Applicant: Databricks, Inc.
Inventor: Shi Xin , Alexander Behm , Shoumik Palkar , Herman Rudolf Petrus Catharina van Hovell tot Westerflier
IPC: G06F16/2453 , G06F16/2458 , G06F16/25
CPC classification number: G06F16/24542 , G06F16/2471 , G06F16/258
Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
-
公开(公告)号:US11853277B2
公开(公告)日:2023-12-26
申请号:US18162579
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Tomas Nykodym , Clemens Mewald
IPC: G06F16/00 , G06F16/21 , G06F16/955 , G06N5/022
CPC classification number: G06F16/219 , G06F16/955 , G06N5/022
Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
-
公开(公告)号:US20230244982A1
公开(公告)日:2023-08-03
申请号:US17587793
申请日:2022-01-28
Applicant: Databricks Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: The present application discloses a method, system, and computer system for tuning a set of models. The method includes determining a set of one or more models to optimize, determining a plurality of optimizer modules with which to optimize the set of one or more models, causing the plurality of optimizer modules to respectively perform a respective optimizing process with respect to at least one model of the set of one or more models, and deploying an optimized model obtained based at least in part on optimizing metrics of the set of the one or more models.
-
公开(公告)号:US20230244720A1
公开(公告)日:2023-08-03
申请号:US17587820
申请日:2022-01-28
Applicant: Databricks Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
IPC: G06F16/903 , G06N20/00
CPC classification number: G06F16/90335 , G06N20/00
Abstract: The present application discloses a method, system, and computer system for querying a model associated with a dataset. The method includes providing an input interface via which a first entity inputs a dataset, receiving the dataset, and providing a selection interface that exposes to a second entity the plurality of models determined for the dataset and/or the plurality of results corresponding to the plurality of models using the index entries. The dataset comprises a plurality of keys and a plurality of key-value relationships, and the dataset is formatted according to a predefined format, wherein index entries are generated for a plurality of models and a plurality of results corresponding to the plurality of models.
-
公开(公告)号:US11693837B2
公开(公告)日:2023-07-04
申请号:US17324907
申请日:2021-05-19
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Tomas Nykodym , Clemens Mewald
IPC: G06F16/00 , G06F16/21 , G06F16/955 , G06N5/022
CPC classification number: G06F16/219 , G06F16/955 , G06N5/022
Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
-
-
-
-
-
-
-
-
-