-
公开(公告)号:US12147412B2
公开(公告)日:2024-11-19
申请号:US18156109
申请日:2023-01-18
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Christos Stavrakakis
Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.
-
公开(公告)号:US20240362215A1
公开(公告)日:2024-10-31
申请号:US18140323
申请日:2023-04-27
Applicant: Databricks, Inc.
Inventor: Venkata Sai Akhil Gudesa , Herman Rudolf Petrus Catharina van Hövell tot Westerflier , Supun Chathuranga Nakandala
IPC: G06F16/2453 , G06F9/48 , G06F11/34 , G06F16/28
CPC classification number: G06F16/2453 , G06F9/4887 , G06F11/3419 , G06F16/285
Abstract: A cluster computing system maintains a first set of queues for short queries and a set second set for longer queries. The first set is allocated a majority of the cluster's processing resources and processes queries on a first in first out basis. The second set is allocated a minority of the cluster's processing resources which are shared among queries in the second set. Accordingly, the system assigns each query to the first set of queues for a fixed amount of resource time. While a query is processing, the system monitors the query's resource time and reassigns the query to the second set of queues if the query has not completed within the allotted amount of resource time. Thus, short queries receive the necessary resources to complete quickly without getting stuck behind longer queries while ensuring that longer queries continue to make progress.
-
公开(公告)号:US12079167B1
公开(公告)日:2024-09-03
申请号:US18093916
申请日:2023-01-06
Applicant: Databricks, Inc.
Inventor: Rahul Shivu Mahadev , Burak Yavuz , Tathagata Das
IPC: G06F16/172 , G06F16/22
CPC classification number: G06F16/172 , G06F16/2282
Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.
-
公开(公告)号:US12072843B1
公开(公告)日:2024-08-27
申请号:US17580475
申请日:2022-01-20
Applicant: Databricks, Inc.
Inventor: Prakhar Jain , Bart Samwel , Burak Yavuz
IPC: G06F16/174
CPC classification number: G06F16/174
Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.
-
公开(公告)号:US20240256360A1
公开(公告)日:2024-08-01
申请号:US18162659
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Shuo Chen , Yuming Qiao , Anders Liu
CPC classification number: G06F9/5077 , G06F9/45558 , G06F2009/4557 , G06F2009/45583
Abstract: Disclosed herein is a method for resource management in a web-based container orchestrating environment. A disclosed method includes initializing a set of micro-virtual machines (VMs) within a macro-VM environment. The method each container within a micro-VM based sandbox. The method assigns a virtual central processing unit (vCPU) to a micro-VM based on an estimated memory required by the micro-VM and the estimated available memory associated with the vCPU. The method pins the vCPU with a physical CPU based on the pod location of the physical CPU and an estimated available memory associated with the vCPU and an available local memory of the physical CPU. The method maintains a state of the vCPU and the physical CPU in a resource manager.
-
公开(公告)号:US20240241877A1
公开(公告)日:2024-07-18
申请号:US18156109
申请日:2023-01-18
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Christos Stavrakakis
IPC: G06F16/23
CPC classification number: G06F16/2315 , G06F16/2358 , G06F16/2379
Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.
-
公开(公告)号:US12019682B2
公开(公告)日:2024-06-25
申请号:US18089349
申请日:2022-12-27
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/215 , G06F16/22 , G06F16/245
CPC classification number: G06F16/9024 , G06F16/215 , G06F16/2282 , G06F16/245
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
-
公开(公告)号:US12008040B2
公开(公告)日:2024-06-11
申请号:US17362456
申请日:2021-06-29
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/215 , G06F16/22 , G06F16/245
CPC classification number: G06F16/9024 , G06F16/215 , G06F16/2282 , G06F16/245
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.
-
公开(公告)号:US11693723B2
公开(公告)日:2023-07-04
申请号:US17537124
申请日:2021-11-29
Applicant: Databricks, Inc.
Inventor: Alicja Luszczak , Srinath Shankar , Shi Xin
CPC classification number: G06F11/0757 , G06F11/076 , G06F11/0721 , G06F11/0793 , G06F11/3024 , G06F11/3419 , G06F2201/81 , G06F2201/88
Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.
-
公开(公告)号:US20230141556A1
公开(公告)日:2023-05-11
申请号:US17976361
申请日:2022-10-28
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Tathagata Das , Shi Xin , Matei Zaharia
IPC: G06F16/2453 , G06F16/2455
CPC classification number: G06F16/24542 , G06F16/24568
Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
-
-
-
-
-
-
-
-
-