Partition tolerance in cluster membership management

    公开(公告)号:US09672115B2

    公开(公告)日:2017-06-06

    申请号:US14209426

    申请日:2014-03-13

    Applicant: VMware, Inc.

    Abstract: Techniques are disclosed for managing a cluster of computing nodes following a division of the cluster into at least a first and second partition, where the cluster aggregates local storage resources of the nodes to provide an object store, and objects stored in the object store are divided into data components stored across the nodes. In accordance with one method, it is determined that a majority of data components comprising a first object are stored within nodes in the first partition. It is determined that a majority of data components comprising a second object are stored within nodes in the second partition. Configuration objects are permitted to be performed on the first object in the first partition while denying access to the first object from the second partition, and on the second object in the second partition while denying access to the second object from the first partition.

    Synchronizing changes to stale components of a distributed object using synchronization bitmaps

    公开(公告)号:US11599554B2

    公开(公告)日:2023-03-07

    申请号:US16888527

    申请日:2020-05-29

    Applicant: VMware, Inc.

    Abstract: The disclosure herein describes tracking changes to a stale component using a synchronization bitmap. A first component of a plurality of mirrored components of the distributed data object becomes available from an unavailable state, and a stale log sequence number (LSN) and a last committed LSN are identified. A synchronization bitmap of the first component associated with a range of LSNs (e.g., from the stale LSN to the last committed LSN) is created and configured to track changes to data blocks of the first component. A second component is identified based on the second component including a tracking bitmap associated with an LSN that matches the stale LSN of the first component. The first component is synchronized with data from the second component based on, wherein the synchronizing includes updating the synchronization bitmap to track changes made to data blocks of the first component.

    Efficient erasure-coded storage in distributed data systems

    公开(公告)号:US11507544B2

    公开(公告)日:2022-11-22

    申请号:US16894663

    申请日:2020-06-05

    Applicant: VMware, Inc.

    Abstract: Techniques for efficiently storing client data blocks on a distributed-computing system are provided. The system includes a fast performance tier and a large capacity tier. The capacity tier stores the client data blocks in erasure encoded data stripes. The performance tier stores logical map data including an address map indicating a correspondence between logical addresses associated with a first layer of the system and physical addresses associated with a second layer. A method includes receiving a request to include additional client data blocks in the client blocks. The request indicates logical addresses for additional blocks. Corresponding physical addresses for additional block are determined. Each additional block is stored at the physical address. Additional logical map data is stored in the performance tier. Storing the additional logical map data includes updating the address map to indicate the correspondence between the logical addresses and the physical addresses for the additional blocks.

    Efficient resynchronization for stale components of geographically distributed computing systems

    公开(公告)号:US11178227B1

    公开(公告)日:2021-11-16

    申请号:US17097479

    申请日:2020-11-13

    Applicant: VMware, Inc.

    Abstract: Described herein are methods and systems for the efficient resyncing of stale components of a distributed-computing system. One method includes determining that a first base component at a remote site will go offline. After determining that the first base component at the remote site will go offline, a first delta component is created at the remote site. While the first base component at the remote site is offline, data corresponding to the offline component is collected at the first delta component at the remote site. After collecting data at the first delta component, the collected data is sent to a local site. The method includes determining that the first base component has come back online. In response to determining that the first base component has come back online, the collected data is sent from the first delta component to the first base component via an intra-site network.

    Intelligently scheduling resynchronization jobs in a distributed object-based storage system

    公开(公告)号:US11023493B2

    公开(公告)日:2021-06-01

    申请号:US16182448

    申请日:2018-11-06

    Applicant: VMware, Inc.

    Abstract: Techniques for intelligently scheduling resynchronization jobs in a distributed object-based storage system are provided. In one set of embodiments, a storage node of the system can create a resynchronization job for a component of an object maintained by the system, where the resynchronization job defines one or more input/output (I/O) operations to be carried out with respect to the component. If a number of currently running resynchronization jobs on the storage node has reached a threshold, the storage node can further determine a priority level associated with the object; add the resynchronization job to an object queue for the object; and if the added resynchronization job is a first job in the object queue, add the object queue as a new queue entry to a global priority queue corresponding to the priority level associated with the object.

    End-to-end checksum in a multi-tenant encryption storage system

    公开(公告)号:US10581602B2

    公开(公告)日:2020-03-03

    申请号:US15866185

    申请日:2018-01-09

    Applicant: VMware, Inc.

    Abstract: A multi-tenant storage system can store clear text data and associated clear text checksum received from a storage tenant using their associated cryptographic key (“cryptokey”). When the clear text data is compressible, cryptographic data (“cryptodata”) is generated from a concatenation of the clear text checksum and compressed clear text data using the cryptokey. A cryptographic checksum (“cryptochecksum”) is generated from the cryptodata. When the clear text data is uncompressible, cryptographic data (“cryptodata”) is generated by encrypting the clear text data using the cryptokey with an extra verification step to make sure the clear text checksum can be rebuilt during the read request. A cryptographic checksum (“cryptochecksum”) is generated from the cryptodata. The cryptodata and associated cryptochecksum are stored in the multi-tenant storage system, so that repairs to damaged cryptodata can be made using the associated cryptochecksum.

    Exclusive session mode resilient to failure

    公开(公告)号:US10419498B2

    公开(公告)日:2019-09-17

    申请号:US14956284

    申请日:2015-12-01

    Applicant: VMware, Inc.

    Abstract: Examples perform input/output (I/O) requests, issued by a plurality of clients to an owner-node, in a virtual storage area network (vSAN) environment. I/O requests are guaranteed, as all I/O requests are performed during non-overlapping, exclusive sessions between one client at a time and the owner node. The owner node rejects requests for simultaneous sessions, and duplicate sessions are prevented by requiring that a client refresh its memory state after termination of a previous session.

    Resumable replica resynchronization

    公开(公告)号:US10365852B2

    公开(公告)日:2019-07-30

    申请号:US15223337

    申请日:2016-07-29

    Applicant: VMware, Inc.

    Abstract: Systems and techniques are described for transferring data. A described technique includes determining that a first replica of an object stored at a first host has become available to a distributed storage system after previously being unavailable to the distributed storage system. The object includes a range of memory addresses at which data of the object is stored. In response to determining that the first replica has become available, resyncing data for the first replica is obtained. The resyncing data indicates whether each range of memory addresses is synchronized at the first replica with other replicas of the object. Tracking data for the first replica is obtained. The tracking data indicates whether data stored at the range of memory addresses of the object has been modified at a second replica while the first replica was unavailable. The resyncing data is updated based on the tracking data.

    RESUMABLE REPLICA RESYNCHRONIZATION
    40.
    发明申请

    公开(公告)号:US20180032257A1

    公开(公告)日:2018-02-01

    申请号:US15223337

    申请日:2016-07-29

    Applicant: VMware, Inc.

    CPC classification number: G06F3/065 G06F3/0617 G06F3/0619 G06F3/067

    Abstract: Systems and techniques are described for transferring data. A described technique includes determining that a first replica of an object stored at a first host has become available to a distributed storage system after previously being unavailable to the distributed storage system. The object includes a range of memory addresses at which data of the object is stored. In response to determining that the first replica has become available, resyncing data for the first replica is obtained. The resyncing data indicates whether each range of memory addresses is synchronized at the first replica with other replicas of the object. Tracking data for the first replica is obtained. The tracking data indicates whether data stored at the range of memory addresses of the object has been modified at a second replica while the first replica was unavailable. The resyncing data is updated based on the tracking data.

Patent Agency Ranking