-
公开(公告)号:US20240061839A1
公开(公告)日:2024-02-22
申请号:US17892376
申请日:2022-08-22
Applicant: Databricks, Inc.
Inventor: Prashanth Menon , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/2453 , G06F16/28
CPC classification number: G06F16/24542 , G06F16/285
Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
-
公开(公告)号:US20230359602A1
公开(公告)日:2023-11-09
申请号:US17738609
申请日:2022-05-06
Applicant: Databricks Inc.
Inventor: Bart Samwel , Prakhar Jain
IPC: G06F16/22
CPC classification number: G06F16/2246
Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.
-
93.
公开(公告)号:US11775499B2
公开(公告)日:2023-10-03
申请号:US17695411
申请日:2022-03-15
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
CPC classification number: G06F16/2358 , G06F16/148 , G06F16/2282
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
公开(公告)号:US20230244991A1
公开(公告)日:2023-08-03
申请号:US17896281
申请日:2022-08-26
Applicant: Databricks, Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
CPC classification number: G06N20/00 , G06K9/6227 , G06K9/6235 , G06K2009/6237
Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.
-
公开(公告)号:US11675767B1
公开(公告)日:2023-06-13
申请号:US17099467
申请日:2020-11-16
Applicant: Databricks, Inc.
Inventor: Alexander Behm , Ankur Dave
IPC: G06F16/00 , G06F16/22 , G06F16/28 , G06F16/242 , G06F16/2455 , G06F16/13
CPC classification number: G06F16/2255 , G06F16/134 , G06F16/2272 , G06F16/244 , G06F16/24556 , G06F16/285
Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.
-
公开(公告)号:US20230177031A1
公开(公告)日:2023-06-08
申请号:US18162579
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Tomas Nykodym , Clemens Mewald
IPC: G06F16/21 , G06F16/955
CPC classification number: G06F16/219 , G06F16/955 , G06N5/022
Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
-
公开(公告)号:US20220374457A1
公开(公告)日:2022-11-24
申请号:US17514997
申请日:2021-10-29
Applicant: Databricks Inc.
Inventor: Mani Parkhe , Clemens Mewald , Matei Zaharia , Avesh Singh
Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
-
公开(公告)号:US20220309103A1
公开(公告)日:2022-09-29
申请号:US17362450
申请日:2021-06-29
Applicant: Databricks Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/22 , G06F16/245
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
-
公开(公告)号:US11113043B2
公开(公告)日:2021-09-07
申请号:US16864074
申请日:2020-04-30
Applicant: Databricks Inc.
Inventor: Srinath Shankar , Eric Keng-Hao Liang , Gregory George Owen
Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.
-
公开(公告)号:US11068447B2
公开(公告)日:2021-07-20
申请号:US15487896
申请日:2017-04-14
Applicant: Databricks Inc.
Inventor: Eric Keng-hao Liang , Srinath Shankar , Shi Xin
IPC: G06F16/18 , G06F16/16 , G06F16/182
Abstract: A system for directory level atomic commits includes an interface and a processor. The interface is configured to receive an indication to provide a set of files. The processor is configured to determine whether a file in a directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted and provide the file as one file of the set of files in the event that the file in the directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted.
-
-
-
-
-
-
-
-
-