-
公开(公告)号:US20230177072A1
公开(公告)日:2023-06-08
申请号:US18162625
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Mani Parkhe , Clemens Mewald , Matei Zaharia , Avesh Singh
CPC classification number: G06F16/288 , G06F30/27
Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
-
公开(公告)号:US20230161767A1
公开(公告)日:2023-05-25
申请号:US18158258
申请日:2023-01-23
Applicant: Databricks, Inc.
Inventor: Shi Xin , Alexander Behm , Shoumik Palkar , Herman Rudolf Petrus Catharina van Hovell tot Westerflier
IPC: G06F16/2453 , G06F16/2458 , G06F16/25
CPC classification number: G06F16/24542 , G06F16/258 , G06F16/2471
Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
-
公开(公告)号:US20220374532A1
公开(公告)日:2022-11-24
申请号:US17514982
申请日:2021-10-29
Applicant: Databricks Inc.
Inventor: Matei Zaharia , David Lewis , Cheng Lian , Yuchen Huo , Ali Ghodsi
Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.
-
14.
公开(公告)号:US10769130B1
公开(公告)日:2020-09-08
申请号:US15987215
申请日:2018-05-23
Applicant: Databricks Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
公开(公告)号:US20180314732A1
公开(公告)日:2018-11-01
申请号:US15581647
申请日:2017-04-28
Applicant: Databricks Inc.
Inventor: Michael Armbrust , Tathagata Das , Shi Xin , Matei Zaharia
IPC: G06F17/30
CPC classification number: G06F16/24542 , G06F16/24568
Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
-
公开(公告)号:US20180314556A1
公开(公告)日:2018-11-01
申请号:US15581987
申请日:2017-04-28
Applicant: Databricks Inc.
Inventor: Ali Ghodsi , Srinath Shankar , Sameer Paranjpye , Shi Xin , Matei Zaharia
IPC: G06F9/50
CPC classification number: G06F9/5061 , G06F2209/5011 , G06F2209/505
Abstract: A system for cluster resource allocation includes an interface and a processor. The interface is configured to receive a process and input data. The processor is configured to determine an estimate for resources required for the process to process the input data; determine existing available resources in a cluster for running the process; determine whether the existing available resources are sufficient for running the process; in the event it is determined that the existing available resources are not sufficient for running the process, indicate to add new resources; determine an allocated share of resources in the cluster for running the process; and cause execution of the process using the share of resources.
-
公开(公告)号:US20180300354A1
公开(公告)日:2018-10-18
申请号:US15487896
申请日:2017-04-14
Applicant: Databricks Inc.
Inventor: Eric Keng-hao Liang , Srinath Shankar , Shi Xin
IPC: G06F17/30
Abstract: A system for directory level atomic commits includes an interface and a processor. The interface is configured to receive an indication to provide a set of files. The processor is configured to determine whether a file in a directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted and provide the file as one file of the set of files in the event that the file in the directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted.
-
公开(公告)号:US20180048536A1
公开(公告)日:2018-02-15
申请号:US15682397
申请日:2017-08-21
Applicant: Databricks Inc.
Inventor: Ali Ghodsi , Ion Stoica , Matei Zaharia
CPC classification number: H04L41/5051 , G06F11/30 , G06F11/3006 , G06F11/3055 , H04L41/5096 , H04L43/0817
Abstract: A system for cluster management comprises a status monitor and an instance replacement manager. The status monitor is for monitoring status of an instance of a set of instances on a cluster provider. The instance replacement manager is for determining a replacement strategy for the instance in the event the instance does not respond. The replacement strategy for the instance is based at least in part on a management criteria for on-demand instances and spot instances on the cluster provider.
-
公开(公告)号:US20180046668A1
公开(公告)日:2018-02-15
申请号:US15675619
申请日:2017-08-11
Applicant: Databricks Inc.
Inventor: Ali Ghodsi , Ion Stoica , Matei Zaharia
IPC: G06F17/30
CPC classification number: G06F17/30424 , G06F17/30389
Abstract: A system for exploring data in a database comprises a query parser, a parameter manager, a query submitter, and a result formatter. The query parser is to receive a base query and determine an input parameter from the base query. The parameter manager is to provide a first request for a value for the input parameter; receive the value for the input parameter; and provide a second request for the value for the input parameter. The query submitter is to determine a first query using the base query and the value for the input parameter; and provide an indication to execute the first query. The result formatter is to receive a result associated with the indication to execute the first query.
-
公开(公告)号:US20250094195A1
公开(公告)日:2025-03-20
申请号:US18368919
申请日:2023-09-15
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Thomas Garnier , Lin Guo , Zhe He , Manlin Li , Yang Liu , Feng Wang , Hong Zhang , Weirong Zhu
Abstract: A resource management configuration may receive an API request from an API server. The API request specifies task information from a plurality of tenants. The configuration transmits status information of a plurality of VMs to the API server to assign tasks to one or more VMs based on the task information and the status information. Tasks assigned to a VM of the plurality of VMs are for one tenant of the plurality of tenants. The configuration configures on an untrusted network, network security groups for managing communications of tenants such that a network security group configured for a tenant permits communications between VMs assigned to the same tenant but prevents communications between VMs assigned to different tenants. The configuration pins each assigned VM of the one or more assigned VMs to perform the task based on the task information of the corresponding tenant.
-
-
-
-
-
-
-
-
-