-
公开(公告)号:US20230385265A1
公开(公告)日:2023-11-30
申请号:US17827795
申请日:2022-05-30
Applicant: VMware, Inc.
Inventor: Christos KARAMANOLIS , Abhishek GUPTA , Richard P. SPILLANE , Marin NOZHCHEV
CPC classification number: G06F16/2365 , G06F16/2282 , G06F11/1435
Abstract: A version control interface provides for accessing a data lake with transactional semantics. Examples generate a plurality of tables for data objects stored in the data lake. The tables each comprise a set of name fields and map a space of columns or rows to a set of the data objects. Transactions read and write data objects and may span a plurality of tables with properties of atomicity, consistency, isolation, durability (ACID). Performing the transaction comprises: accumulating transaction-incomplete messages, indicating that the transaction is incomplete, until a transaction-complete message is received, indicating that the transaction is complete. Upon this occurring, a master branch is updated to reference the data objects according to the transaction-incomplete messages and the transaction-complete message. Tables may be grouped into data groups that provide atomicity boundaries so that different groups may be served by different master branches, thereby improving the speed of master branch updates.
-
公开(公告)号:US20230409545A1
公开(公告)日:2023-12-21
申请号:US17845683
申请日:2022-06-21
Applicant: VMware, Inc.
Inventor: Abhishek GUPTA , Christos KARAMANOLIS , Richard P. SPILLANE , Marin NOZHCHEV
CPC classification number: G06F16/219 , G06F16/2219
Abstract: A version control interface provides for time travel with metadata management under a common transaction domain as the data. Examples generate a time-series of master branch snapshots for data objects stored in a data lake, with the snapshot comprising a tree data structure such as a hash tree and associated with a time indication. Readers select a master branch snapshot from the time-series, based on selection criteria (e.g., time) and use references in the selected master branch snapshot to read data objects from the data lake. This provides readers with a view of the data as of a specified time.
-
3.
公开(公告)号:US20190018865A1
公开(公告)日:2019-01-17
申请号:US15651823
申请日:2017-07-17
Applicant: VMware, Inc.
Inventor: Antoni IVANOV , Denitsa GENCHEVA , Marin NOZHCHEV
IPC: G06F17/30
CPC classification number: G06F16/212 , G06F16/2343 , G06F16/2379 , G06F16/258
Abstract: Embodiments of the present disclosure relate to techniques for using distributed locks (e.g., among a plurality of database management services) for accessing a database to ensure data consistency in the database during concurrent continuous data processing and schema or data administration of the database. In particular, certain embodiments relate to not holding an exclusive lock of the database for all operations of an extract, transform, load (ETL) process to load data from a data source into the database. Further, certain embodiments relate to resolving schema changes that occur to a database schema of the database in the middle of the ETL process when the exclusive lock is not held.
-
公开(公告)号:US20230205757A1
公开(公告)日:2023-06-29
申请号:US17564206
申请日:2021-12-28
Applicant: VMware, Inc.
Inventor: Abhishek GUPTA , Richard P. SPILLANE , Christos KARAMANOLIS , Marin NOZHCHEV
CPC classification number: G06F16/2379 , G06F16/2246
Abstract: A version control interface for data provides a layer of abstraction that permits multiple readers and writers to access data lakes concurrently. An overlay file system, based on a data structure such as a tree, is used on top of one or more underlying storage instances to implement the interface. Each tree node tree is identified and accessed by means of any universally unique identifiers. Copy-on-write with the tree data structure implements snapshots of the overlay file system. The snapshots support a long-lived master branch, with point-in-time snapshots of its history, and one or more short-lived private branches. As data objects are written to the data lake, the private branch corresponding to a writer is updated. The private branches are merged back into the master branch using any merging logic, and conflict resolution policies are implemented. Readers read from the updated master branch or from any of the private branches.
-
-
-