Patent search aee:"Databricks Inc." Page 5

41.

发明申请
PIPELINED EXECUTION OF DATABASE QUERIES PROCESSING STREAMING DATA 有权

公开(公告)号：US20250165477A1

公开(公告)日：2025-05-22

申请号：US18511902

申请日：2023-11-16

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Alexander Balikov , Boyang Peng

IPC: G06F16/2455 , G06F9/48 , G06F16/2453

Abstract: A database system performs pipelined execution of queries that process batches of streaming data. The database system compiles a database query to generate an execution plan and determines a set of stages based on the execution plan. The database query processes streaming data comprising batches. A scheduler schedules pipelined execution stages of the database query. Accordingly, the database system performs execution of a particular stage processing a batch of the streaming data in parallel with subsequent stages of the database query processing previous batches of the streaming data. The system further maintains watermarks for different stages of the database query.

42.

发明申请
UPDATE AND QUERY OF A LARGE COLLECTION OF FILES THAT REPRESENT A SINGLE DATASET STORED ON A BLOB STORE 有权

公开(公告)号：US20250156394A1

公开(公告)日：2025-05-15

申请号：US18985397

申请日：2024-12-18

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz

IPC: G06F16/23 , G06F16/14 , G06F16/22

Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

43.

发明授权
Query watchdog 有权

公开(公告)号：US12287698B2

公开(公告)日：2025-04-29

申请号：US18200316

申请日：2023-05-22

Applicant: Databricks, Inc.

Inventor： Alicja Luszczak , Srinath Shankar , Shi Xin

IPC: G06F11/07 , G06F11/30 , G06F11/34

Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.

44.

发明申请
DATA ASSET SHARING BETWEEN ACCOUNTS AT A DATA PROCESSING SERVICE USING CLOUD TOKENS 有权

公开(公告)号：US20250131070A1

公开(公告)日：2025-04-24

申请号：US18491500

申请日：2023-10-20

Applicant: Databricks, Inc.

Inventor： Xiaotong Sun , Abhijit Chakankar , Ramesh Chandra

IPC: G06F21/31

Abstract: A data processing service receives indication that a recipient will request access to data assets of a provider and provides a request for credentials from a recipient governance module. The recipient governance module stores a recipient metastore including an object for a provider metastore. In response to determining that the assets are associated with the provider metastore, the service provides a request for credentials to a provider governance module. The provider governance module stores the provider metastore describing data assets of the provider and permissions for accessing data assets. The provider metastore includes a recipient object attached to the data assets with an identifier for the recipient metastore. In response to verifying that the recipient was provided access to the data assets, the service provides a token to the recipient governance module. The service then provides the token to a computing resource to provide access to the data assets.

45.

发明申请
MODEL ML REGISTRY AND MODEL SERVING 有权

公开(公告)号：US20250021536A1

公开(公告)日：2025-01-16

申请号：US18885322

申请日：2024-09-13

Applicant: Databricks, Inc.

Inventor： Aaron Daniel Davidson , Clemens Mewald , Tomas Nykodym

IPC: G06F16/21 , G06F16/955 , G06N5/022

Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.

46.

发明申请
DATA FILE CLUSTERING WITH KD-EPSILON TREES 有权

公开(公告)号：US20250013619A1

公开(公告)日：2025-01-09

申请号：US18218766

申请日：2023-07-06

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Bart Samwel

IPC: G06F16/22 , G06F16/2453 , G06F16/28

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

47.

发明授权
Hash based rollup with passthrough 有权

公开(公告)号：US12153558B1

公开(公告)日：2024-11-26

申请号：US18162093

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Alexander Behm , Ankur Dave

IPC: G06F16/00 , G06F16/13 , G06F16/22 , G06F16/242 , G06F16/2455 , G06F16/28

Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.

48.

发明授权
Multiple pass sort 有权

公开(公告)号：US12105690B1

公开(公告)日：2024-10-01

申请号：US17875176

申请日：2022-07-27

Applicant: Databricks Inc.

Inventor： Timothy Armstrong , Arvind Sai Krishnan , Khayyam Guliyev

IPC: G06F16/00 , G06F16/22 , G06F16/2455

CPC classification number: G06F16/2246 , G06F16/24552

Abstract: A system for multipass sort includes a communication interface and a processor. The communication interface is configured to receive from a client device a request to sort a dataset that includes a plurality of rows. The processor is configured to perform a first sort pass on the dataset in part by: extracting prefixes associated with a first schema element associated with the dataset for the plurality of rows; and sorting the extracted prefixes utilizing an integer sort algorithm based on a sort order included in the request to sort the dataset, where sorting the extracted prefixes includes utilizing NULL values to resolve a tied range that includes at least two rows of the plurality of rows having a same extracted prefix.

49.

发明公开
MULTI-CLUSTER QUERY RESULT CACHING 审中-公开

公开(公告)号：US20240265010A1

公开(公告)日：2024-08-08

申请号：US18221735

申请日：2023-07-13

Applicant: Databricks, Inc.

Inventor： Saksham Garg , Bogdan Ionut Ghit , Christopher Stevens , Christian Stuart

IPC: G06F16/2453 , G06F16/25 , G06F16/28

CPC classification number: G06F16/24539 , G06F16/24542 , G06F16/256 , G06F16/285

Abstract: A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.

50.

发明公开
STATIC APPROACH TO LAZY MATERIALIZATION IN DATABASE SCANS USING PUSHED FILTERS 审中-公开

公开(公告)号：US20240256539A1

公开(公告)日：2024-08-01

申请号：US18160850

申请日：2023-01-27

Applicant: Databricks, Inc.

Inventor： Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy

IPC: G06F16/2453 , G06F16/22

CPC classification number: G06F16/24539 , G06F16/221

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. The method includes receiving a request to perform a new query in a columnar database containing a plurality of columns. A step in the method includes accessing a set of data in a column of the plurality of columns based on the query. The method includes generating an input to a machine-learned model comprising characteristics of the set of data in the column. From the machine-learned model, the method includes generating a likelihood value indicative of whether a filter of a first portion of the set of data in the column has greater efficiency than a download followed by a filter of the set of data in the column. The method further includes comparing the likelihood value to a threshold value. Based on the comparison, the method includes filtering the first portion of the set of data before downloading the set of data if the likelihood value is equal to or above the threshold value.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification