-
公开(公告)号:US12105690B1
公开(公告)日:2024-10-01
申请号:US17875176
申请日:2022-07-27
Applicant: Databricks Inc.
Inventor: Timothy Armstrong , Arvind Sai Krishnan , Khayyam Guliyev
IPC: G06F16/00 , G06F16/22 , G06F16/2455
CPC classification number: G06F16/2246 , G06F16/24552
Abstract: A system for multipass sort includes a communication interface and a processor. The communication interface is configured to receive from a client device a request to sort a dataset that includes a plurality of rows. The processor is configured to perform a first sort pass on the dataset in part by: extracting prefixes associated with a first schema element associated with the dataset for the plurality of rows; and sorting the extracted prefixes utilizing an integer sort algorithm based on a sort order included in the request to sort the dataset, where sorting the extracted prefixes includes utilizing NULL values to resolve a tied range that includes at least two rows of the plurality of rows having a same extracted prefix.
-
公开(公告)号:US12298952B1
公开(公告)日:2025-05-13
申请号:US17875180
申请日:2022-07-27
Applicant: Databricks, Inc.
Inventor: Timothy Armstrong , Arvind Sai Krishnan , Khayyam Guliyev
IPC: G06F16/22 , G06F16/2453
Abstract: A system for multipass sort with subsplitting includes a communication interface and a processor. The communication interface is configured to receive from a client device a request to sort a dataset that includes a plurality of rows, where the size of the dataset is greater than a threshold size. The processor is configured to: subdivide the dataset into a plurality of data subsets; sort each of the plurality of data subsets; merge the plurality of sorted data subsets utilizing a binary merge tree to generate a sorted dataset; and provide the sorted dataset to the client device.
-