-
公开(公告)号:US20240070153A1
公开(公告)日:2024-02-29
申请号:US17895877
申请日:2022-08-25
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain
IPC: G06F16/2453 , G06F11/34 , G06F16/22 , G06F16/28
CPC classification number: G06F16/24544 , G06F11/3409 , G06F16/2282 , G06F16/285
Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
-
公开(公告)号:US11874832B2
公开(公告)日:2024-01-16
申请号:US18158258
申请日:2023-01-23
Applicant: Databricks, Inc.
Inventor: Shi Xin , Alexander Behm , Shoumik Palkar , Herman Rudolf Petrus Catharina van Hovell tot Westerflier
IPC: G06F16/2453 , G06F16/2458 , G06F16/25
CPC classification number: G06F16/24542 , G06F16/2471 , G06F16/258
Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
-
公开(公告)号:US11853277B2
公开(公告)日:2023-12-26
申请号:US18162579
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Tomas Nykodym , Clemens Mewald
IPC: G06F16/00 , G06F16/21 , G06F16/955 , G06N5/022
CPC classification number: G06F16/219 , G06F16/955 , G06N5/022
Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
-
公开(公告)号:US20230244982A1
公开(公告)日:2023-08-03
申请号:US17587793
申请日:2022-01-28
Applicant: Databricks Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: The present application discloses a method, system, and computer system for tuning a set of models. The method includes determining a set of one or more models to optimize, determining a plurality of optimizer modules with which to optimize the set of one or more models, causing the plurality of optimizer modules to respectively perform a respective optimizing process with respect to at least one model of the set of one or more models, and deploying an optimized model obtained based at least in part on optimizing metrics of the set of the one or more models.
-
公开(公告)号:US20230244720A1
公开(公告)日:2023-08-03
申请号:US17587820
申请日:2022-01-28
Applicant: Databricks Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
IPC: G06F16/903 , G06N20/00
CPC classification number: G06F16/90335 , G06N20/00
Abstract: The present application discloses a method, system, and computer system for querying a model associated with a dataset. The method includes providing an input interface via which a first entity inputs a dataset, receiving the dataset, and providing a selection interface that exposes to a second entity the plurality of models determined for the dataset and/or the plurality of results corresponding to the plurality of models using the index entries. The dataset comprises a plurality of keys and a plurality of key-value relationships, and the dataset is formatted according to a predefined format, wherein index entries are generated for a plurality of models and a plurality of results corresponding to the plurality of models.
-
公开(公告)号:US11693837B2
公开(公告)日:2023-07-04
申请号:US17324907
申请日:2021-05-19
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Tomas Nykodym , Clemens Mewald
IPC: G06F16/00 , G06F16/21 , G06F16/955 , G06N5/022
CPC classification number: G06F16/219 , G06F16/955 , G06N5/022
Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
-
公开(公告)号:US20230140169A1
公开(公告)日:2023-05-04
申请号:US18089349
申请日:2022-12-27
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/245 , G06F16/215 , G06F16/22
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
-
公开(公告)号:US11586624B2
公开(公告)日:2023-02-21
申请号:US17237979
申请日:2021-04-22
Applicant: Databricks Inc.
Inventor: Shi Xin , Alexander Behm , Shoumik Palkar , Herman Rudolf Petrus Catharina van Hövell tot Westerflier
IPC: G06F16/2453 , G06F16/2458 , G06F16/25
Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
-
公开(公告)号:US11481398B1
公开(公告)日:2022-10-25
申请号:US17116230
申请日:2020-12-09
Applicant: Databricks Inc.
Inventor: Alexander Behm , Ankur Dave , Ryan Deng , Shoumik Palkar
IPC: G06F16/2455 , G06F16/22
Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.
-
公开(公告)号:US11468369B1
公开(公告)日:2022-10-11
申请号:US17587806
申请日:2022-01-28
Applicant: Databricks Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.
-
-
-
-
-
-
-
-
-