-
公开(公告)号:US20250103580A1
公开(公告)日:2025-03-27
申请号:US18928982
申请日:2024-10-28
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Christos Stavrakakis
IPC: G06F16/23
Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.
-
公开(公告)号:US12248818B1
公开(公告)日:2025-03-11
申请号:US17514988
申请日:2021-10-29
Applicant: Databricks, Inc.
Inventor: Yandong Mao , Aaron Daniel Davidson
Abstract: The present application discloses a method, system, and computer system for starting up and maintaining a cluster in a warmed up state, and/or allocating clusters from a warmed up state. The method includes instantiating a set of virtual machines, wherein instantiating the set of virtual machines includes setting a temporary security credential for each virtual machine of the set of virtual machines, receiving a virtual machine allocation request associated with a workspace, a customer, or a tenant, in response to the virtual machine allocation request: allocating a virtual machine, wherein allocating the virtual machine comprises replacing the temporary security credential with a security credential associated with the workspace, the customer, or the tenant.
-
公开(公告)号:US12242485B2
公开(公告)日:2025-03-04
申请号:US18162616
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Utkarsh Agarwal , Shoumik Palkar , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/24 , G06F11/34 , G06F16/22 , G06F16/2455
Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.
-
公开(公告)号:US12242441B1
公开(公告)日:2025-03-04
申请号:US18162562
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Tao Feng , Menglei Sun , Zhuoying Wang
IPC: G06F16/28 , G06F11/07 , G06F16/215 , G06F16/22 , G06F16/23 , G06F16/906 , G06F17/18
Abstract: The present application discloses a method, system, and computer system for managing lineage data for data entities. The method includes generating lineage data, wherein generating the lineage data, and storing and indexing, in a data structure, the lineage data in association with the selected data entity. The generating the lineage data includes selecting a selected data entity, obtaining a query tree that was used to generate the selected data entity, and determining lineage data for the selected data entity based at least in part on the query tree.
-
公开(公告)号:US12210528B2
公开(公告)日:2025-01-28
申请号:US18162607
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Utkarsh Agarwal , Shoumik Palkar , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/2455 , G06F11/34 , G06F16/22
Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.
-
公开(公告)号:US12210521B2
公开(公告)日:2025-01-28
申请号:US18140323
申请日:2023-04-27
Applicant: Databricks, Inc.
Inventor: Venkata Sai Akhil Gudesa , Herman Rudolf Petrus Catharina van Hövell tot Westerflier , Supun Chathuranga Nakandala
IPC: G06F16/24 , G06F9/48 , G06F11/34 , G06F16/2453 , G06F16/28
Abstract: A cluster computing system maintains a first set of queues for short queries and a set second set for longer queries. The first set is allocated a majority of the cluster's processing resources and processes queries on a first in first out basis. The second set is allocated a minority of the cluster's processing resources which are shared among queries in the second set. Accordingly, the system assigns each query to the first set of queues for a fixed amount of resource time. While a query is processing, the system monitors the query's resource time and reassigns the query to the second set of queues if the query has not completed within the allotted amount of resource time. Thus, short queries receive the necessary resources to complete quickly without getting stuck behind longer queries while ensuring that longer queries continue to make progress.
-
公开(公告)号:US12204510B2
公开(公告)日:2025-01-21
申请号:US18144647
申请日:2023-05-08
Applicant: Databricks, Inc.
Inventor: Vijayan Prabhakaran , Himanshu Raja , Rahul Potharaju , Naga Raju Bhanoori , Lin Ma , Rajesh Parangi Sharabhalingappa , Jintian Liang , Zachary Vaughn Schuermann , Kam Cheung Ting
Abstract: Disclosed is a configuration for managing the organization of data tables in cloud-based storage. The configuration receives metrics for data processing operations on the data table. Metrics include at least one of a size of the data table, a size of each file in the data table, and metadata describing the data table. The configuration automatically executes a cost-benefit analysis based on the one or more metrics for each candidate maintenance operation in a plurality of candidate maintenance operations. The configuration automatically selects a maintenance operation from the candidate maintenance operations to automate based on the cost-benefit analysis of the one or more candidate maintenance operations. The selected maintenance operation is automated and scheduled on the data table.
-
公开(公告)号:US12197400B1
公开(公告)日:2025-01-14
申请号:US18473992
申请日:2023-09-25
Applicant: Databricks, Inc.
Inventor: William Chau , Abhijit Chakankar , Stephen Michael Mahoney , Daniel Seth Morris , Itai Shlomo Weiss
Abstract: A data processing service receives a request from a first collaborator to create a clean room for data sharing collaboration with at least a second collaborator. In response, the data processing service creates an execution environment separate from the data environment of the first collaborator and the second collaborator. The first and second collaborators can then add content into the clean room in the form of data tables and executable notebooks. Approval from each collaborator is required before a notebook can be executed using any data table shared into the clean room. Upon receiving notebook approval from each collaborator, the data processing service creates a notebook job to execute the notebook on one or more cluster computing resources of the data processing service to generate an output.
-
公开(公告)号:US12189628B2
公开(公告)日:2025-01-07
申请号:US18162366
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Prashanth Menon , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/00 , G06F16/2453 , G06F16/28
Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
-
110.
公开(公告)号:US12189607B2
公开(公告)日:2025-01-07
申请号:US18236516
申请日:2023-08-22
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log, determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
-
-
-
-
-
-
-
-