Patent search ap:"Databricks Inc." Page 8

71.

发明授权
Multi-cluster query result caching 有权

公开(公告)号：US12189625B2

公开(公告)日：2025-01-07

申请号：US18222343

申请日：2023-07-14

Applicant: Databricks, Inc.

Inventor： Bogdan Ionut Ghit , Saksham Garg , Christian Stuart , Christopher Stevens

IPC: G06F16/24 , G06F16/2453 , G06F16/25 , G06F16/28

Abstract: A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.

72.

发明申请
FEATURE FUNCTION BASED COMPUTATION OF ON-DEMAND FEATURES OF MACHINE LEARNING MODELS 有权

公开(公告)号：US20240412095A1

公开(公告)日：2024-12-12

申请号：US18206460

申请日：2023-06-06

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , Avesh Singh , Mani Parkhe , Maxim Lukiyanov , Xiangrui Meng , Aakrati Talati , Chenen Liang , Kasey Uhlenhuth

IPC: G06N20/00

Abstract: A system performs training and execution of machine learning models that use on-demand features using feature functions. The system receives commands for registering metadata associated with a machine learning model. The machine learning model may process a set of features including on-demand features as well as other features such as batch features. The system executes the command by storing an association between the machine learning model and the feature functions associated with any on-demand features processed by the machine learning model. The feature functions are executed using an end point of a data asset service. The use of the data asset service for invoking the feature functions ensures that the same set of instructions is executed during model training and model inferencing, thereby avoiding model skew.

73.

发明申请
AUTO MAINTENANCE FOR DATA TABLES IN CLOUD STORAGE 有权

公开(公告)号：US20240378181A1

公开(公告)日：2024-11-14

申请号：US18144647

申请日：2023-05-08

Applicant: Databricks, Inc.

Inventor： Vijayan Prabhakaran , Himanshu Raja , Rahul Potharaju , Naga Raju Bhanoori , Lin Ma , Rajesh Parangi Sharabhalingappa , Jintian Liang , Zach Schuermann , Kam Cheung Ting

IPC: G06F16/21 , G06F11/34 , G06F16/22

Abstract: Disclosed is a configuration for managing the organization of data tables in cloud-based storage. The configuration receives metrics for data processing operations on the data table. Metrics include at least one of a size of the data table, a size of each file in the data table, and metadata describing the data table. The configuration automatically executes a cost-benefit analysis based on the one or more metrics for each candidate maintenance operation in a plurality of candidate maintenance operations. The configuration automatically selects a maintenance operation from the candidate maintenance operations to automate based on the cost-benefit analysis of the one or more candidate maintenance operations. The selected maintenance operation is automated and scheduled on the data table.

74.

发明公开
RETRIEVAL AND CACHING OF OBJECT METADATA ACROSS DATA SOURCES AND STORAGE SYSTEMS 审中-公开

公开(公告)号：US20240346007A1

公开(公告)日：2024-10-17

申请号：US18135078

申请日：2023-04-14

Applicant: Databricks, Inc.

Inventor： Zhaoxing Li , Rayman Preet Singh , Fuat Can Efeoglu , Daniel Tenedorio , Sarah Cai

IPC: G06F16/23 , G06F16/2455

CPC classification number: G06F16/2365 , G06F16/24552

Abstract: A system for retrieving and caching metadata from a remote data source is described.
The system may receive a request from a client device. The request is to perform a query operation on a set of data objects stored in the remote data source. The system may access a metadata cache storing metadata information on one or more data objects of the remote data source and identify metadata corresponding to the set of data objects for the query operation in the metadata cache. The system may determine whether the identified metadata for the set of data objects meets an update condition. In response to the identified metadata meeting the update condition, the system may fetch updated metadata for at least the set of data objects from the remote data source, and store the updated metadata in the metadata cache.

75.

发明授权
Model ML registry and model serving 有权

公开(公告)号：US12117983B2

公开(公告)日：2024-10-15

申请号：US18512028

申请日：2023-11-17

Applicant: Databricks, Inc.

Inventor： Aaron Daniel Davidson , Clemens Mewald , Tomas Nykodym

IPC: G06F16/00 , G06F16/21 , G06F16/955 , G06N5/022

CPC classification number: G06F16/219 , G06F16/955 , G06N5/022

Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.

76.

发明授权
Scan parsing 有权

公开(公告)号：US12072880B2

公开(公告)日：2024-08-27

申请号：US17892376

申请日：2022-08-22

Applicant: Databricks, Inc.

Inventor： Prashanth Menon , Alexander Behm , Sriram Krishnamurthy

IPC: G06F9/00 , G06F16/2453 , G06F16/28

CPC classification number: G06F16/24542 , G06F16/285

Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

77.

发明公开
Multi-Cluster Query Result Caching 审中-公开

公开(公告)号：US20240265011A1

公开(公告)日：2024-08-08

申请号：US18222343

申请日：2023-07-14

Applicant: Databricks, Inc.

Inventor： Saksham Garg , Bogdan Ionut Ghit , Christopher Stevens , Christian Stuart

IPC: G06F16/2453

CPC classification number: G06F16/24539

Abstract: A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.

78.

发明公开
Dictionary Filtering and Evaluation in Columnar Databases 审中-公开

公开(公告)号：US20240256550A1

公开(公告)日：2024-08-01

申请号：US18162616

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Utkarsh Agarwal , Shoumik Palkar , Alexander Behm , Sriram Krishnamurthy

IPC: G06F16/2455 , G06F11/34 , G06F16/22

CPC classification number: G06F16/24558 , G06F11/3409 , G06F16/221

Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

79.

发明公开
Evaluating Expressions Over Dictionary Data 审中-公开

公开(公告)号：US20240256549A1

公开(公告)日：2024-08-01

申请号：US18162607

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Utkarsh Agarwal , Shoumik Palkar , Alexander Behm , Sriram Krishnamurthy

IPC: G06F16/2455 , G06F11/34 , G06F16/22

CPC classification number: G06F16/24558 , G06F11/3409 , G06F16/221

Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

80.

发明公开
STATE REBALANCING IN STRUCTURED STREAMING 审中-公开

公开(公告)号：US20240202211A1

公开(公告)日：2024-06-20

申请号：US18219314

申请日：2023-07-07

Applicant: Databricks, Inc.

Inventor： Alexander Balikov , Tathagata Das , Karthikeyan Ramasamy

IPC: G06F16/27 , G06F16/2455

CPC classification number: G06F16/278 , G06F16/24568

Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification