Patent search ap:"Databricks Inc." Page 3

21.

发明申请
Automated Processing of Multiple Prediction Generation Including Model Tuning 有权

公开(公告)号：US20250061378A1

公开(公告)日：2025-02-20

申请号：US18738025

申请日：2024-06-09

Applicant: Databricks, Inc.

Inventor： Benjamin Thomas Wilson , Corey Zumar

IPC: G06N20/00 , G06F18/20 , G06F18/2132

Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.

22.

发明申请
STATE REBALANCING IN STRUCTURED STREAMING 有权

公开(公告)号：US20250061132A1

公开(公告)日：2025-02-20

申请号：US18822023

申请日：2024-08-30

Applicant: Databricks, Inc.

Inventor： Alexander Balikov , Tathagata Das , Karthikeyan Ramasamy

IPC: G06F16/27 , G06F16/2455

Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.

23.

发明申请
MESSAGING DEDPULICATION IN PUBLISH / SUBSCRIBE SYSTEM 有权

公开(公告)号：US20250028686A1

公开(公告)日：2025-01-23

申请号：US18224981

申请日：2023-07-21

Applicant: Databricks, Inc.

Inventor： Pranav Anand , Praveen Gattu , Anish Shrigondekar , Huanli Wang

IPC: G06F16/174 , G06F16/14 , G06F16/16

Abstract: A device for using message identifiers for Publish/subscribe messaging deduplication is described. The system may fetch one or more sets of data records from a data source, and each data record is associated with a message identifier. The system may store the one or more sets of data records in a data file, which is associated with a metadata comprising the message identifier, a file path and a row number for each data record. The system may determine whether one or more of the data records are duplicated based on the associated message identifiers. In response to determining that the one or more data records are duplicated, the system may generate a second metadata comprising the file paths and row numbers associated with the duplicated data records.

24.

发明申请
DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES 有权

公开(公告)号：US20250013606A1

公开(公告)日：2025-01-09

申请号：US18218410

申请日：2023-07-05

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Terry Kim , Vijayan Prabhakaran , Bart Samwel

IPC: G06F16/16 , G06F16/13

Abstract: A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

25.

发明授权
Concurrent optimistic transactions for tables with deletion vectors 有权

公开(公告)号：US12147412B2

公开(公告)日：2024-11-19

申请号：US18156109

申请日：2023-01-18

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Christos Stavrakakis

IPC: G06F16/00 , G06F16/23

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

26.

发明公开
SHORT QUERY PRIORITIZATION FOR DATA PROCESSING SERVICE 审中-公开

公开(公告)号：US20240362215A1

公开(公告)日：2024-10-31

申请号：US18140323

申请日：2023-04-27

Applicant: Databricks, Inc.

Inventor： Venkata Sai Akhil Gudesa , Herman Rudolf Petrus Catharina van Hövell tot Westerflier , Supun Chathuranga Nakandala

IPC: G06F16/2453 , G06F9/48 , G06F11/34 , G06F16/28

CPC classification number: G06F16/2453 , G06F9/4887 , G06F11/3419 , G06F16/285

Abstract: A cluster computing system maintains a first set of queues for short queries and a set second set for longer queries. The first set is allocated a majority of the cluster's processing resources and processes queries on a first in first out basis. The second set is allocated a minority of the cluster's processing resources which are shared among queries in the second set. Accordingly, the system assigns each query to the first set of queues for a fixed amount of resource time. While a query is processing, the system monitors the query's resource time and reassigns the query to the second set of queues if the query has not completed within the allotted amount of resource time. Thus, short queries receive the necessary resources to complete quickly without getting stuck behind longer queries while ensuring that longer queries continue to make progress.

27.

发明授权
Scaling delta table optimize command 有权

公开(公告)号：US12079167B1

公开(公告)日：2024-09-03

申请号：US18093916

申请日：2023-01-06

Applicant: Databricks, Inc.

Inventor： Rahul Shivu Mahadev , Burak Yavuz , Tathagata Das

IPC: G06F16/172 , G06F16/22

CPC classification number: G06F16/172 , G06F16/2282

Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.

28.

发明授权
Data maintenance transaction rollbacks 有权

公开(公告)号：US12072843B1

公开(公告)日：2024-08-27

申请号：US17580475

申请日：2022-01-20

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Bart Samwel , Burak Yavuz

IPC: G06F16/174

CPC classification number: G06F16/174

Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.

29.

发明公开
NUMA AWARENESS ARCHITECTURE FOR VM-BASED CONTAINER IN KUBERNETES ENVIRONMENT 审中-公开

公开(公告)号：US20240256360A1

公开(公告)日：2024-08-01

申请号：US18162659

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Shuo Chen , Yuming Qiao , Anders Liu

IPC: G06F9/50 , G06F9/455

CPC classification number: G06F9/5077 , G06F9/45558 , G06F2009/4557 , G06F2009/45583

Abstract: Disclosed herein is a method for resource management in a web-based container orchestrating environment. A disclosed method includes initializing a set of micro-virtual machines (VMs) within a macro-VM environment. The method each container within a micro-VM based sandbox. The method assigns a virtual central processing unit (vCPU) to a micro-VM based on an estimated memory required by the micro-VM and the estimated available memory associated with the vCPU. The method pins the vCPU with a physical CPU based on the pod location of the physical CPU and an estimated available memory associated with the vCPU and an available local memory of the physical CPU. The method maintains a state of the vCPU and the physical CPU in a resource manager.

30.

发明公开
CONCURRENT OPTIMISTIC TRANSACTIONS FOR TABLES WITH DELETION VECTORS 审中-公开

公开(公告)号：US20240241877A1

公开(公告)日：2024-07-18

申请号：US18156109

申请日：2023-01-18

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Christos Stavrakakis

IPC: G06F16/23

CPC classification number: G06F16/2315 , G06F16/2358 , G06F16/2379

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification