Patent search ap:("Databricks Page Inc.") AND inv:"Prakhar Jain"

1.

发明申请
Efficient Merging of Tabular Data with Post-Processing Compaction 有权

公开(公告)号：US20250013644A1

公开(公告)日：2025-01-09

申请号：US18769269

申请日：2024-07-10

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain

IPC: G06F16/2453 , G06F11/34 , G06F16/22 , G06F16/28

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

2.

发明授权
K-D tree balanced splitting 有权

公开(公告)号：US12061586B2

公开(公告)日：2024-08-13

申请号：US17738609

申请日：2022-05-06

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Prakhar Jain

IPC: G06F16/22 , G06F16/28

CPC classification number: G06F16/2246 , G06F16/285

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

3.

发明申请
DATA FILE CLUSTERING WITH KD-EPSILON TREES 有权

公开(公告)号：US20250013619A1

公开(公告)日：2025-01-09

申请号：US18218766

申请日：2023-07-06

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Bart Samwel

IPC: G06F16/22 , G06F16/2453 , G06F16/28

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

4.

发明申请
DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES 有权

公开(公告)号：US20250013606A1

公开(公告)日：2025-01-09

申请号：US18218410

申请日：2023-07-05

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Terry Kim , Vijayan Prabhakaran , Bart Samwel

IPC: G06F16/16 , G06F16/13

Abstract: A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

5.

发明授权
Data maintenance transaction rollbacks 有权

公开(公告)号：US12072843B1

公开(公告)日：2024-08-27

申请号：US17580475

申请日：2022-01-20

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Bart Samwel , Burak Yavuz

IPC: G06F16/174

CPC classification number: G06F16/174

Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.

6.

发明授权
Data ingestion using data file clustering with KD-epsilon trees 有权

公开(公告)号：US12072863B1

公开(公告)日：2024-08-27

申请号：US18218400

申请日：2023-07-05

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Bart Samwel

IPC: G06F16/20 , G06F16/22 , G06F16/23 , G06F16/245 , G06F16/28

CPC classification number: G06F16/2246 , G06F16/2358 , G06F16/245 , G06F16/285

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

7.

发明授权
Efficient merging of tabular data with post-processing compaction 有权

公开(公告)号：US12056126B2

公开(公告)日：2024-08-06

申请号：US17895877

申请日：2022-08-25

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain

IPC: G06F17/30 , G06F11/34 , G06F16/22 , G06F16/2453 , G06F16/28

CPC classification number: G06F16/24544 , G06F11/3409 , G06F16/2282 , G06F16/285

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

8.

发明公开
EFFICIENT MERGING OF TABULAR DATA WITH POST-PROCESSING COMPACTION 审中-公开

公开(公告)号：US20240070153A1

公开(公告)日：2024-02-29

申请号：US17895877

申请日：2022-08-25

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel , Prakhar Jain

IPC: G06F16/2453 , G06F11/34 , G06F16/22 , G06F16/28

CPC classification number: G06F16/24544 , G06F11/3409 , G06F16/2282 , G06F16/285

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

9.

发明申请
K-D Tree Balanced Splitting 有权

公开(公告)号：US20250086155A1

公开(公告)日：2025-03-13

申请号：US18772758

申请日：2024-07-15

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Prakhar Jain

IPC: G06F16/22 , G06F16/28

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

10.

发明公开
K-D TREE BALANCED SPLITTING 审中-公开

公开(公告)号：US20230359602A1

公开(公告)日：2023-11-09

申请号：US17738609

申请日：2022-05-06

Applicant: Databricks Inc.

Inventor： Bart Samwel , Prakhar Jain

IPC: G06F16/22

CPC classification number: G06F16/2246

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification