-
公开(公告)号:US12153558B1
公开(公告)日:2024-11-26
申请号:US18162093
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Alexander Behm , Ankur Dave
IPC: G06F16/00 , G06F16/13 , G06F16/22 , G06F16/242 , G06F16/2455 , G06F16/28
Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.
-
公开(公告)号:US11675767B1
公开(公告)日:2023-06-13
申请号:US17099467
申请日:2020-11-16
Applicant: Databricks, Inc.
Inventor: Alexander Behm , Ankur Dave
IPC: G06F16/00 , G06F16/22 , G06F16/28 , G06F16/242 , G06F16/2455 , G06F16/13
CPC classification number: G06F16/2255 , G06F16/134 , G06F16/2272 , G06F16/244 , G06F16/24556 , G06F16/285
Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.
-
公开(公告)号:US11481398B1
公开(公告)日:2022-10-25
申请号:US17116230
申请日:2020-12-09
Applicant: Databricks Inc.
Inventor: Alexander Behm , Ankur Dave , Ryan Deng , Shoumik Palkar
IPC: G06F16/2455 , G06F16/22
Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.
-
-