DATA LAKE WITH TRANSACTIONAL SEMANTICS
    1.
    发明公开

    公开(公告)号:US20230385265A1

    公开(公告)日:2023-11-30

    申请号:US17827795

    申请日:2022-05-30

    Applicant: VMware, Inc.

    CPC classification number: G06F16/2365 G06F16/2282 G06F11/1435

    Abstract: A version control interface provides for accessing a data lake with transactional semantics. Examples generate a plurality of tables for data objects stored in the data lake. The tables each comprise a set of name fields and map a space of columns or rows to a set of the data objects. Transactions read and write data objects and may span a plurality of tables with properties of atomicity, consistency, isolation, durability (ACID). Performing the transaction comprises: accumulating transaction-incomplete messages, indicating that the transaction is incomplete, until a transaction-complete message is received, indicating that the transaction is complete. Upon this occurring, a master branch is updated to reference the data objects according to the transaction-incomplete messages and the transaction-complete message. Tables may be grouped into data groups that provide atomicity boundaries so that different groups may be served by different master branches, thereby improving the speed of master branch updates.

    DISTRIBUTED LOCKS FOR CONTINUOUS DATA PROCESSING AND SCHEMA ADMINISTRATION OF A DATABASE

    公开(公告)号:US20190018865A1

    公开(公告)日:2019-01-17

    申请号:US15651823

    申请日:2017-07-17

    Applicant: VMware, Inc.

    CPC classification number: G06F16/212 G06F16/2343 G06F16/2379 G06F16/258

    Abstract: Embodiments of the present disclosure relate to techniques for using distributed locks (e.g., among a plurality of database management services) for accessing a database to ensure data consistency in the database during concurrent continuous data processing and schema or data administration of the database. In particular, certain embodiments relate to not holding an exclusive lock of the database for all operations of an extract, transform, load (ETL) process to load data from a data source into the database. Further, certain embodiments relate to resolving schema changes that occur to a database schema of the database in the middle of the ETL process when the exclusive lock is not held.

    VERSION CONTROL INTERFACE FOR ACCESSING DATA LAKES

    公开(公告)号:US20230205757A1

    公开(公告)日:2023-06-29

    申请号:US17564206

    申请日:2021-12-28

    Applicant: VMware, Inc.

    CPC classification number: G06F16/2379 G06F16/2246

    Abstract: A version control interface for data provides a layer of abstraction that permits multiple readers and writers to access data lakes concurrently. An overlay file system, based on a data structure such as a tree, is used on top of one or more underlying storage instances to implement the interface. Each tree node tree is identified and accessed by means of any universally unique identifiers. Copy-on-write with the tree data structure implements snapshots of the overlay file system. The snapshots support a long-lived master branch, with point-in-time snapshots of its history, and one or more short-lived private branches. As data objects are written to the data lake, the private branch corresponding to a writer is updated. The private branches are merged back into the master branch using any merging logic, and conflict resolution policies are implemented. Readers read from the updated master branch or from any of the private branches.

Patent Agency Ranking