Patent search caee:"Databricks Inc." Page 5

41.

发明授权
Multiple pass sort 有权

公开(公告)号：US12105690B1

公开(公告)日：2024-10-01

申请号：US17875176

申请日：2022-07-27

Applicant: Databricks Inc.

Inventor： Timothy Armstrong , Arvind Sai Krishnan , Khayyam Guliyev

IPC: G06F16/00 , G06F16/22 , G06F16/2455

CPC classification number: G06F16/2246 , G06F16/24552

Abstract: A system for multipass sort includes a communication interface and a processor. The communication interface is configured to receive from a client device a request to sort a dataset that includes a plurality of rows. The processor is configured to perform a first sort pass on the dataset in part by: extracting prefixes associated with a first schema element associated with the dataset for the plurality of rows; and sorting the extracted prefixes utilizing an integer sort algorithm based on a sort order included in the request to sort the dataset, where sorting the extracted prefixes includes utilizing NULL values to resolve a tied range that includes at least two rows of the plurality of rows having a same extracted prefix.

42.

发明公开
MULTI-CLUSTER QUERY RESULT CACHING 审中-公开

公开(公告)号：US20240265010A1

公开(公告)日：2024-08-08

申请号：US18221735

申请日：2023-07-13

Applicant: Databricks, Inc.

Inventor： Saksham Garg , Bogdan Ionut Ghit , Christopher Stevens , Christian Stuart

IPC: G06F16/2453 , G06F16/25 , G06F16/28

CPC classification number: G06F16/24539 , G06F16/24542 , G06F16/256 , G06F16/285

Abstract: A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.

43.

发明公开
STATIC APPROACH TO LAZY MATERIALIZATION IN DATABASE SCANS USING PUSHED FILTERS 审中-公开

公开(公告)号：US20240256539A1

公开(公告)日：2024-08-01

申请号：US18160850

申请日：2023-01-27

Applicant: Databricks, Inc.

Inventor： Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy

IPC: G06F16/2453 , G06F16/22

CPC classification number: G06F16/24539 , G06F16/221

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. The method includes receiving a request to perform a new query in a columnar database containing a plurality of columns. A step in the method includes accessing a set of data in a column of the plurality of columns based on the query. The method includes generating an input to a machine-learned model comprising characteristics of the set of data in the column. From the machine-learned model, the method includes generating a likelihood value indicative of whether a filter of a first portion of the set of data in the column has greater efficiency than a download followed by a filter of the set of data in the column. The method further includes comparing the likelihood value to a threshold value. Based on the comparison, the method includes filtering the first portion of the set of data before downloading the set of data if the likelihood value is equal to or above the threshold value.

44.

发明授权
Efficient merge of tabular data with deletion indications 有权

公开(公告)号：US12045220B2

公开(公告)日：2024-07-23

申请号：US17895890

申请日：2022-08-25

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Chirstos Stavrakakis

IPC: G06F17/30 , G06F9/48 , G06F16/22

CPC classification number: G06F16/2282 , G06F9/4881

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).

45.

发明公开
EFFICIENTLY VECTORIZED IMPLEMENTATION OF OPERATIONS IN A GLOBAL GRID INDEXING LIBRARY 审中-公开

公开(公告)号：US20240152338A1

公开(公告)日：2024-05-09

申请号：US18501839

申请日：2023-11-03

Applicant: Databricks, Inc.

Inventor： Desmond Cheong Zhi Xi , Menelaos Karavelas

IPC: G06F8/41

CPC classification number: G06F8/452

Abstract: A data processing service generates for iteratively applying a geospatial function to geospatial data. The generated code includes at least a first iterative loop and a second iterative loop. The data processing service compiles the generated code to generate compiled code that vectorized at least the second iterative loop. The data processing service receives a request from a client device to perform one or more data processing operations including applying the geospatial function to a data table of geospatial cell indices. The data processing service compiles the request into one or more tasks including at least a vectorized operation based on the compiled code and executes the one or more tasks by at least invoking the vectorized operation on the set of worker nodes.

46.

发明公开
EFFICIENT MERGE OF TABULAR DATA WITH DELETION INDICATIONS 审中-公开

公开(公告)号：US20240070138A1

公开(公告)日：2024-02-29

申请号：US17895890

申请日：2022-08-25

Applicant: Databricks Inc.

Inventor： Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Chirstos Stavrakakis

IPC: G06F16/22 , G06F9/48

CPC classification number: G06F16/2282 , G06F9/4881

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).

47.

发明公开
UPDATE AND QUERY OF A LARGE COLLECTION OF FILES THAT REPRESENT A SINGLE DATASET STORED ON A BLOB STORE 审中-公开

公开(公告)号：US20230394029A1

公开(公告)日：2023-12-07

申请号：US18236516

申请日：2023-08-22

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz

IPC: G06F16/23 , G06F16/14 , G06F16/22

CPC classification number: G06F16/2358 , G06F16/148 , G06F16/2282

Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

48.

发明公开
QUERY WATCHDOG 审中-公开

公开(公告)号：US20230359516A1

公开(公告)日：2023-11-09

申请号：US18200316

申请日：2023-05-22

Applicant: Databricks, Inc.

Inventor： Alicja Luszczak , Srinath Shankar , Shi Xin

IPC: G06F11/07 , G06F11/34 , G06F11/30

CPC classification number: G06F11/0757 , G06F11/0721 , G06F11/0793 , G06F11/3419 , G06F11/3024 , G06F11/076 , G06F2201/88 , G06F2201/81

Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.

49.

发明授权
Function creation for database execution of deep learning model 有权

公开(公告)号：US11599783B1

公开(公告)日：2023-03-07

申请号：US15610062

申请日：2017-05-31

Applicant: Databricks, Inc.

Inventor： Sue Ann Hong , Shi Xin , Timothee Hunter , Ali Ghodsi

IPC: G06N3/08 , G06N3/063 , G06N5/02 , G06N3/04 , G06N5/022 , G06F16/14 , G06F16/22

Abstract: A function creation method is disclosed. The method comprises defining one or more database function inputs, defining cluster processing information, defining a deep learning model, and defining one or more database function outputs. A database function is created based at least in part on the one or more database function inputs, the cluster set-up information, the deep learning model, and the one or more database function outputs. In some embodiments, the database function enables a non-technical user to utilize deep learning models.

50.

发明授权
Dataflow graph processing 有权

公开(公告)号：US11567998B2

公开(公告)日：2023-01-31

申请号：US17362450

申请日：2021-06-29

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio

IPC: G06F16/901 , G06F16/245 , G06F16/22

Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification