-
公开(公告)号:US20220374457A1
公开(公告)日:2022-11-24
申请号:US17514997
申请日:2021-10-29
Applicant: Databricks Inc.
Inventor: Mani Parkhe , Clemens Mewald , Matei Zaharia , Avesh Singh
Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
-
公开(公告)号:US20220309103A1
公开(公告)日:2022-09-29
申请号:US17362450
申请日:2021-06-29
Applicant: Databricks Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/22 , G06F16/245
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
-
公开(公告)号:US11113043B2
公开(公告)日:2021-09-07
申请号:US16864074
申请日:2020-04-30
Applicant: Databricks Inc.
Inventor: Srinath Shankar , Eric Keng-Hao Liang , Gregory George Owen
Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.
-
公开(公告)号:US11068447B2
公开(公告)日:2021-07-20
申请号:US15487896
申请日:2017-04-14
Applicant: Databricks Inc.
Inventor: Eric Keng-hao Liang , Srinath Shankar , Shi Xin
IPC: G06F16/18 , G06F16/16 , G06F16/182
Abstract: A system for directory level atomic commits includes an interface and a processor. The interface is configured to receive an indication to provide a set of files. The processor is configured to determine whether a file in a directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted and provide the file as one file of the set of files in the event that the file in the directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted.
-
公开(公告)号:US10810051B1
公开(公告)日:2020-10-20
申请号:US16188989
申请日:2018-11-13
Applicant: Databricks Inc.
Inventor: Srinath Shankar , Eric Keng-Hao Liang
Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a set of tasks executing or pending on the set of cluster machines; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of the set of tasks nor storing one or more intermediate data files of a set of intermediate data files, where the set of intermediate data files is associated with a set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.
-
公开(公告)号:US20200301684A1
公开(公告)日:2020-09-24
申请号:US16864074
申请日:2020-04-30
Applicant: Databricks Inc.
Inventor: Srinath Shankar , Eric Keng-Hao Liang , Gregory George Owen
Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.
-
公开(公告)号:US20200241950A1
公开(公告)日:2020-07-30
申请号:US16793921
申请日:2020-02-18
Applicant: Databricks Inc.
Inventor: Alicja Luszczak , Srinath Shankar , Shi Xin
Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.
-
公开(公告)号:US20190258479A1
公开(公告)日:2019-08-22
申请号:US16378353
申请日:2019-04-08
Applicant: Databricks Inc.
Inventor: Timothee Hunter , Ali Ghodsi , Ion Stoica
Abstract: A system for processing a notebook includes an input interface and a processor. The input interface is to receive a first notebook. The notebook comprises code for interactively querying and viewing data. The processor is to load the first notebook into a shell. The shell receives one or more parameters associated with the first notebook. The shell executes the first notebook using a cluster.
-
公开(公告)号:US20180121194A1
公开(公告)日:2018-05-03
申请号:US15803604
申请日:2017-11-03
Applicant: Databricks Inc.
Inventor: Timothee Hunter , Ali Ghodsi , Ion Stoica
CPC classification number: G06F8/71 , G06F8/54 , G06F9/445 , G06F9/45512 , G06F9/5066 , G06F17/30867
Abstract: A system for processing a notebook includes an input interface and a processor. The input interface is to receive a first notebook. The notebook comprises code for interactively querying and viewing data. The processor is to load the first notebook into a shell. The shell receives one or more parameters associated with the first notebook. The shell executes the first notebook using a cluster.
-
公开(公告)号:US20170220667A1
公开(公告)日:2017-08-03
申请号:US15485952
申请日:2017-04-12
Applicant: Databricks Inc.
Inventor: Ali Ghodsi , Ion Stoica
CPC classification number: G06F17/30598 , G06F9/5033 , G06F9/5072 , G06F2209/505
Abstract: A cluster system includes an interface and a processor. The interface is to receive a request from a user associated with one of a plurality of shells. The processor is to determine a plurality of tasks to respond to the request; determine a local set of data and a shared set of data for a task of the plurality of tasks, wherein the local set of data is associated with the one of the plurality of shells; and provide the task, a local set indication, and a shared set indication to a worker associated with the task, wherein the local set indication refers to the local set of data and the shared set indication refers to the shared set of data.
-
-
-
-
-
-
-
-
-