Jitter-tolerant distributed two-phase commit (2PC) systems

    公开(公告)号:US11507411B1

    公开(公告)日:2022-11-22

    申请号:US17382443

    申请日:2021-07-22

    Applicant: VMware, Inc.

    Abstract: A method of ensuring atomicity of transactions across a plurality of active hosts in a distributed environment, is provided. The method generally includes receiving, from a client, a second request to commit a second transaction subsequent to receiving a first request to commit a first transaction; assigning a second prepare identifier (ID) to the second transaction, wherein the second prepare ID assigned to the second transaction is greater than a first prepare ID assigned to the first transaction; transmitting, to the plurality of active hosts, instructions to prepare for committing the second transaction, the instructions including the second prepare ID; receiving, from each host, an acknowledgement indicating successful preparation for committing the second transaction; and transmitting, to the plurality of active hosts, instructions to commit the second transaction prior to receiving, from each host, an acknowledgement indicating successful preparation for committing the first transaction.

    Identifying fault domains for delta components of a distributed data object

    公开(公告)号:US11422904B2

    公开(公告)日:2022-08-23

    申请号:US17106004

    申请日:2020-11-27

    Applicant: VMware, Inc.

    Abstract: The disclosure herein describes placing delta components of a base component in target fault domains. One or more delta components are generated. When a first fault domain that lacks a sibling component of the base component is identified, the first fault domain is selected as a single delta target fault domain and a single delta component is placed on the single delta target fault domain. When a second fault domain that includes a first sibling component of the base component is identified and a third fault domain that includes a second sibling component of the base component is identified, the second fault domain and the third fault domain are selected as a first double delta target fault domain and a second double delta target fault domain, and a first double delta component and a second double delta component are placed on the first and second double delta target fault domains.

    Time-based congestion discounting for I/O fairness control

    公开(公告)号:US10965739B2

    公开(公告)日:2021-03-30

    申请号:US15947313

    申请日:2018-04-06

    Applicant: VMware, Inc.

    Abstract: Computer system and method for managing storage requests in a distributed storage system uses congestion signals associated with storage requests, which are generated based on congestion at local storage of the computer system that supports a virtual storage area network. The storage requests are differentiated between a first class of storage requests and at least one other class of storage requests. For a storage request of the first class of storage requests, an actual ratio of a current average bandwidth of the first class of storage requests to a current average bandwidth of a second class of storage requests is calculated and compared with an expected ratio. The congestion signal associated with the storage request is then adjusted and transmitted to at least one source of storage requests for storage request fairness control.

    Code block resynchronization for distributed multi-mirror erasure coding system

    公开(公告)号:US10509708B2

    公开(公告)日:2019-12-17

    申请号:US15621130

    申请日:2017-06-13

    Applicant: VMware, Inc.

    Abstract: Techniques are disclosed for resynchronizing a node of a distributed storage system with other nodes of the distributed storage system. Some embodiments presented herein include a computer-implemented method for resynchronizing a node of a distributed storage system with other nodes of the distributed storage system. The method comprises identifying an out-of-sync block of the node. The method further comprises determining that the out-of-sync block is a code block, wherein the code block is generated by performing an erasure coding operation on data blocks which are stored in the other nodes. The method further comprises locating a mirrored code block in an address space maintained for mirrored code blocks. The method further comprises storing contents of the mirrored code block in a storage location of the out-of-sync block.

    TWO-PHASE COMMIT USING RESERVED LOG SEQUENCE VALUES

    公开(公告)号:US20240111755A1

    公开(公告)日:2024-04-04

    申请号:US17957941

    申请日:2022-09-30

    Applicant: VMware, Inc.

    CPC classification number: G06F16/2379 G06F13/1668 G06F16/2308

    Abstract: System and method for managing different classes of storage input/output (I/O) requests for a two-phase commit operation in a distributed storage system assigns reserved log sequence values to each of storage I/O requests of a first class, which are added to a two-phase commit queue. The reserved log sequence values of the storage I/O requests of the first class in the two-phase commit queue are assigned to some of the storage I/O requests of the second class, which are added to the two-phase commit queue.

    SYSTEM AND METHOD FOR DELETING PARENT SNAPSHOTS OF RUNNING POINTS OF STORAGE OBJECTS USING EXCLUSIVE NODE LISTS OF THE PARENT SNAPSHOTS

    公开(公告)号:US20230281084A1

    公开(公告)日:2023-09-07

    申请号:US17684177

    申请日:2022-03-01

    Applicant: VMware, Inc.

    CPC classification number: G06F11/1453 G06F2201/84

    Abstract: System and method for deleting parent snapshots of running points of storage objects stored in a storage system, in response to a request to delete a parent snapshot of a running point of a storage object stored in the storage system, traverses a subtree of a B tree that corresponds to a logical map of the parent snapshot to find nodes of the subtree that are exclusively owned by the parent snapshot, which are added to an exclusive node list of the parent snapshot. The minimum node ownership value of the running point is then changed to the minimum node ownership value of the parent snapshot so that any node of the subtree of the B tree with a node ownership value equal to or greater than the changed minimum node ownership value is deemed to be owned by the running point. The nodes of the subtree of the B tree that are found in the exclusive node list of the parent snapshot are then deleted.

    DISTRIBUTED STORAGE SYSTEM AND METHOD FOR MANAGING STORAGE ACCESS BANDWIDTH FOR MULTIPLE CLIENTS

    公开(公告)号:US20190303308A1

    公开(公告)日:2019-10-03

    申请号:US15944743

    申请日:2018-04-03

    Applicant: VMware, Inc.

    Abstract: System and method for managing storage requests issued from multiple sources in a distributed storage system utilizes different queues at a host computer in the distributed storage system to place different classes of storage requests for access to a virtual storage area network. The storage requests in the queues are processed using a fair scheduling algorithm. For each queue, when the storage requests in the queue exceeds a threshold, a backpressure signal is generated and transmitted to at least one source for a class of storage requests queued in one of the queues corresponding to that backpressure signal to delay issuance of new storage requests of that class of storage requests.

    Synchronizing a stale component of a distributed object using a delta component during maintenance

    公开(公告)号:US11947827B2

    公开(公告)日:2024-04-02

    申请号:US16875624

    申请日:2020-05-15

    Applicant: VMware, Inc.

    CPC classification number: G06F3/065 G06F3/0617 G06F3/0689

    Abstract: The disclosure herein describes enhancing data durability of a base component using a delta component. A delta component is generated based on the base component becoming unavailable. The delta component is configured to include unwritten storage space with an address space matching the base component and a tracking bitmap associated with data blocks of the address space of the delta component. Write operations targeted for the base component are routed to the delta component. Based on the routed write operations, bits associated with data blocks affected by the write operations are changed in the tracking bitmap. Based on the base component becoming available, data blocks affected by routed write operations are identified based on the tracking bitmap and the identified data blocks are synchronized from the delta component to the base component. The delta component is then removed.

Patent Agency Ranking