Patent search ap:("Databricks Inc.") AND inv:"Bart Samwel" Page 1

1.

发明申请
DATA FILE CLUSTERING WITH KD-EPSILON TREES 有权

公开(公告)号：US20250013619A1

公开(公告)日：2025-01-09

申请号：US18218766

申请日：2023-07-06

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Bart Samwel

IPC: G06F16/22 , G06F16/2453 , G06F16/28

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

2.

发明授权
Efficient merge of tabular data with deletion indications 有权

公开(公告)号：US12045220B2

公开(公告)日：2024-07-23

申请号：US17895890

申请日：2022-08-25

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Chirstos Stavrakakis

IPC: G06F17/30 , G06F9/48 , G06F16/22

CPC classification number: G06F16/2282 , G06F9/4881

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).

3.

发明公开
EFFICIENT MERGE OF TABULAR DATA WITH DELETION INDICATIONS 审中-公开

公开(公告)号：US20240070138A1

公开(公告)日：2024-02-29

申请号：US17895890

申请日：2022-08-25

Applicant: Databricks Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Chirstos Stavrakakis

IPC: G06F16/22 , G06F9/48

CPC classification number: G06F16/2282 , G06F9/4881

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).

4.

发明申请
Efficient Merging of Tabular Data with Post-Processing Compaction 有权

公开(公告)号：US20250013644A1

公开(公告)日：2025-01-09

申请号：US18769269

申请日：2024-07-10

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain

IPC: G06F16/2453 , G06F11/34 , G06F16/22 , G06F16/28

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

5.

发明授权
K-D tree balanced splitting 有权

公开(公告)号：US12061586B2

公开(公告)日：2024-08-13

申请号：US17738609

申请日：2022-05-06

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Prakhar Jain

IPC: G06F16/22 , G06F16/28

CPC classification number: G06F16/2246 , G06F16/285

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

6.

发明申请
DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES 有权

公开(公告)号：US20250013606A1

公开(公告)日：2025-01-09

申请号：US18218410

申请日：2023-07-05

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Terry Kim , Vijayan Prabhakaran , Bart Samwel

IPC: G06F16/16 , G06F16/13

Abstract: A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

7.

发明授权
Concurrent optimistic transactions for tables with deletion vectors 有权

公开(公告)号：US12147412B2

公开(公告)日：2024-11-19

申请号：US18156109

申请日：2023-01-18

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Christos Stavrakakis

IPC: G06F16/00 , G06F16/23

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

8.

发明授权
Data maintenance transaction rollbacks 有权

公开(公告)号：US12072843B1

公开(公告)日：2024-08-27

申请号：US17580475

申请日：2022-01-20

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Bart Samwel , Burak Yavuz

IPC: G06F16/174

CPC classification number: G06F16/174

Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.

9.

发明公开
CONCURRENT OPTIMISTIC TRANSACTIONS FOR TABLES WITH DELETION VECTORS 审中-公开

公开(公告)号：US20240241877A1

公开(公告)日：2024-07-18

申请号：US18156109

申请日：2023-01-18

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Christos Stavrakakis

IPC: G06F16/23

CPC classification number: G06F16/2315 , G06F16/2358 , G06F16/2379

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

10.

发明申请
K-D Tree Balanced Splitting 有权

公开(公告)号：US20250086155A1

公开(公告)日：2025-03-13

申请号：US18772758

申请日：2024-07-15

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Prakhar Jain

IPC: G06F16/22 , G06F16/28

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification