Patent search caee:"Databricks Inc." Page 2

11.

发明公开
FEATURE STORE WITH INTEGRATED TRACKING 审中-公开

公开(公告)号：US20230177072A1

公开(公告)日：2023-06-08

申请号：US18162625

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Mani Parkhe , Clemens Mewald , Matei Zaharia , Avesh Singh

IPC: G06F16/28 , G06F30/27

CPC classification number: G06F16/288 , G06F30/27

Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.

12.

发明公开
INTEGRATED NATIVE VECTORIZED ENGINE FOR COMPUTATION 审中-公开

公开(公告)号：US20230161767A1

公开(公告)日：2023-05-25

申请号：US18158258

申请日：2023-01-23

Applicant: Databricks, Inc.

Inventor： Shi Xin , Alexander Behm , Shoumik Palkar , Herman Rudolf Petrus Catharina van Hovell tot Westerflier

IPC: G06F16/2453 , G06F16/2458 , G06F16/25

CPC classification number: G06F16/24542 , G06F16/258 , G06F16/2471

Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.

13.

发明申请
MANAGED METASTORAGE 有权

公开(公告)号：US20220374532A1

公开(公告)日：2022-11-24

申请号：US17514982

申请日：2021-10-29

Applicant: Databricks Inc.

Inventor： Matei Zaharia , David Lewis , Cheng Lian , Yuchen Huo , Ali Ghodsi

IPC: G06F21/62 , G06F3/06

Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.

14.

发明授权
Update and query of a large collection of files that represent a single dataset stored on a blob store 有权

公开(公告)号：US10769130B1

公开(公告)日：2020-09-08

申请号：US15987215

申请日：2018-05-23

Applicant: Databricks Inc.

Inventor： Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz

IPC: G06F16/23 , G06F16/14 , G06F16/22

Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

15.

发明授权
Managed metastorage 有权

公开(公告)号：US12277237B2

公开(公告)日：2025-04-15

申请号：US17514982

申请日：2021-10-29

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , David Lewis , Cheng Lian , Yuchen Huo , Ali Ghodsi

IPC: G06F21/62 , G06F3/06

Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.

16.

发明申请
RESOURCE MANAGEMENT WITH INTERMEDIARY NODE IN KUBERNETES ENVIRONMENT 有权

公开(公告)号：US20250094195A1

公开(公告)日：2025-03-20

申请号：US18368919

申请日：2023-09-15

Applicant: Databricks, Inc.

Inventor： Aaron Daniel Davidson , Thomas Garnier , Lin Guo , Zhe He , Manlin Li , Yang Liu , Feng Wang , Hong Zhang , Weirong Zhu

IPC: G06F9/455 , G06F9/54

Abstract: A resource management configuration may receive an API request from an API server. The API request specifies task information from a plurality of tenants. The configuration transmits status information of a plurality of VMs to the API server to assign tasks to one or more VMs based on the task information and the status information. Tasks assigned to a VM of the plurality of VMs are for one tenant of the plurality of tenants. The configuration configures on an untrusted network, network security groups for managing communications of tenants such that a network security group configured for a tenant permits communications between VMs assigned to the same tenant but prevents communications between VMs assigned to different tenants. The configuration pins each assigned VM of the one or more assigned VMs to perform the task based on the task information of the corresponding tenant.

17.

发明申请
Automated Processing of Multiple Prediction Generation Including Model Tuning 有权

公开(公告)号：US20250061378A1

公开(公告)日：2025-02-20

申请号：US18738025

申请日：2024-06-09

Applicant: Databricks, Inc.

Inventor： Benjamin Thomas Wilson , Corey Zumar

IPC: G06N20/00 , G06F18/20 , G06F18/2132

Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.

18.

发明申请
STATE REBALANCING IN STRUCTURED STREAMING 有权

公开(公告)号：US20250061132A1

公开(公告)日：2025-02-20

申请号：US18822023

申请日：2024-08-30

Applicant: Databricks, Inc.

Inventor： Alexander Balikov , Tathagata Das , Karthikeyan Ramasamy

IPC: G06F16/27 , G06F16/2455

Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.

19.

发明申请
MESSAGING DEDPULICATION IN PUBLISH / SUBSCRIBE SYSTEM 有权

公开(公告)号：US20250028686A1

公开(公告)日：2025-01-23

申请号：US18224981

申请日：2023-07-21

Applicant: Databricks, Inc.

Inventor： Pranav Anand , Praveen Gattu , Anish Shrigondekar , Huanli Wang

IPC: G06F16/174 , G06F16/14 , G06F16/16

Abstract: A device for using message identifiers for Publish/subscribe messaging deduplication is described. The system may fetch one or more sets of data records from a data source, and each data record is associated with a message identifier. The system may store the one or more sets of data records in a data file, which is associated with a metadata comprising the message identifier, a file path and a row number for each data record. The system may determine whether one or more of the data records are duplicated based on the associated message identifiers. In response to determining that the one or more data records are duplicated, the system may generate a second metadata comprising the file paths and row numbers associated with the duplicated data records.

20.

发明申请
DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES 有权

公开(公告)号：US20250013606A1

公开(公告)日：2025-01-09

申请号：US18218410

申请日：2023-07-05

Applicant: Databricks, Inc.

Inventor： Prakhar Jain , Frederick Ryan Johnson , Terry Kim , Vijayan Prabhakaran , Bart Samwel

IPC: G06F16/16 , G06F16/13

Abstract: A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification