Patent search ap:"Databricks Inc." Page 12

111.

发明申请
Dataflow Graph Processing with Expectations 有权

公开(公告)号：US20250005076A1

公开(公告)日：2025-01-02

申请号：US18658418

申请日：2024-05-08

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio

IPC: G06F16/901 , G06F16/215 , G06F16/22 , G06F16/245

Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.

112.

发明授权
Data sharing for network connected systems 有权

公开(公告)号：US12182292B1

公开(公告)日：2024-12-31

申请号：US18162353

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , Shixiong Zhu , Xiaotong Sun , Ramesh Chandra , Michael Paul Armbrust , Ali Ghodsi

IPC: G06F21/62 , G06F21/00 , G06F21/60

Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.

113.

发明申请
Fetching Query Results Through Cloud Object Stores 有权

公开(公告)号：US20240394271A1

公开(公告)日：2024-11-28

申请号：US18614380

申请日：2024-03-22

Applicant: Databricks, Inc.

Inventor： Bogdan Ionut Ghit , Juliusz Sompolski , Shi Xin , Bart Samwel

IPC: G06F16/2458 , G06F11/34 , G06F16/242 , G06F16/25

Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

114.

发明授权
Data sharing for network connected systems 有权

公开(公告)号：US12147555B1

公开(公告)日：2024-11-19

申请号：US17733485

申请日：2022-04-29

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , Shixiong Zhu , Xiaotong Sun , Ramesh Chandra , Michael Paul Armbrust , Ali Ghodsi

IPC: G06F21/62 , G06F21/00 , G06F21/60

Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.

115.

发明授权
Adaptive approach to lazy materialization in database scans using pushed filters 有权

公开(公告)号：US12124450B2

公开(公告)日：2024-10-22

申请号：US18160861

申请日：2023-01-27

Applicant: Databricks, Inc.

Inventor： Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy

IPC: G06F16/2453 , G06F11/34 , G06F16/22

CPC classification number: G06F16/24545 , G06F11/3409 , G06F16/221

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

116.

发明授权
State rebalancing in structured streaming 有权

公开(公告)号：US12099525B2

公开(公告)日：2024-09-24

申请号：US18219314

申请日：2023-07-07

Applicant: Databricks, Inc.

Inventor： Alexander Balikov , Tathagata Das , Karthikeyan Ramasamy

IPC: G06F16/27 , G06F16/2455

CPC classification number: G06F16/278 , G06F16/24568

Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.

117.

发明授权
Data ingestion using data file clustering with KD-epsilon trees 有权

公开(公告)号：US12072863B1

公开(公告)日：2024-08-27

申请号：US18218400

申请日：2023-07-05

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Bart Samwel

IPC: G06F16/20 , G06F16/22 , G06F16/23 , G06F16/245 , G06F16/28

CPC classification number: G06F16/2246 , G06F16/2358 , G06F16/245 , G06F16/285

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

118.

发明授权
Efficient merging of tabular data with post-processing compaction 有权

公开(公告)号：US12056126B2

公开(公告)日：2024-08-06

申请号：US17895877

申请日：2022-08-25

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain

IPC: G06F17/30 , G06F11/34 , G06F16/22 , G06F16/2453 , G06F16/28

CPC classification number: G06F16/24544 , G06F11/3409 , G06F16/2282 , G06F16/285

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

119.

发明授权
Structured cluster execution for data streams 有权

公开(公告)号：US12032573B2

公开(公告)日：2024-07-09

申请号：US17976361

申请日：2022-10-28

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Tathagata Das , Shi Xin , Matei Zaharia

IPC: G06F16/2453 , G06F16/2455

CPC classification number: G06F16/24542 , G06F16/24568

Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.

120.

发明授权
Fetching query results through cloud object stores 有权

公开(公告)号：US11960494B1

公开(公告)日：2024-04-16

申请号：US17841946

申请日：2022-06-16

Applicant: Databricks, Inc.

Inventor： Bogdan Ionut Ghit , Juliusz Sompolski , Shi Xin , Bart Samwel

IPC: G06F16/2458 , G06F11/34 , G06F16/242 , G06F16/25

CPC classification number: G06F16/2471 , G06F11/3419 , G06F16/244 , G06F16/256

Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification