Data storage system with metadata check-pointing

    公开(公告)号:US11169723B2

    公开(公告)日:2021-11-09

    申请号:US16457008

    申请日:2019-06-28

    Abstract: A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to volume data being flushed to the data storage sleds. Also, the primary and secondary head nodes store check-points of volume metadata to the data storage sleds, wherein in response to a failure of a primary or secondary head node for a volume partition, a replacement secondary head node for the volume partition recreates a secondary replica for the volume partition based, at least in part, on a stored volume metadata checkpoint.

    Dual isolation recovery for primary-secondary server architectures

    公开(公告)号:US11010266B1

    公开(公告)日:2021-05-18

    申请号:US16210428

    申请日:2018-12-05

    Abstract: Generally described, one or more aspects of the present application correspond to techniques for automatic recovery from dual isolation in which both the primary and secondary replicas of a volume are stored on isolating servers. The disclosed techniques use handshakes between the client and the replicas to determine which has a better health score. The replica with the better health score becomes the primary replica, and confirms that it and the secondary replica are both in an isolating state. In response, the primary replica seeks a solo blessing, undoes the isolating state at the volume level (the server host will still be in isolating state), and continues handling I/O and peer replication until its healthy peer is complete. These techniques can avoid availability drops when the servers hosting the primary and secondary replicas of a volume enter the isolating state at around the same time.

    Data replication snapshots for persistent storage using operation numbers

    公开(公告)号:US10191813B2

    公开(公告)日:2019-01-29

    申请号:US15694684

    申请日:2017-09-01

    Abstract: Persistent storage for a master copy is provided using operation numbers. A master copy can include a persistent key-value store such as a B-tree with references to corresponding data. When provisioning a slave copy, the master copy sends a point-in-time copy of the B-tree to the slave copy, which stores a copy of the B-tree, allocates the necessary space, and updates the references of the B-tree to point to a local storage before the data is transferred. When writing the data to persistent storage, a snapshot created on the master copy is an operation that is replicated to the slave copy. The snapshot is generated using a volume view that includes changes to chunks of data of the master copy since a previous snapshot, as determined using the operation number for the previous snapshot. Data (and metadata) for the snapshot is written to persistent storage while new input/output operations are processed.

    Background task scheduling based on shared background bandwidth

    公开(公告)号:US11314547B1

    公开(公告)日:2022-04-26

    申请号:US16832982

    申请日:2020-03-27

    Abstract: Techniques for background task scheduling based on shared background bandwidth are described. A method for background task scheduling based on shared background bandwidth may include receiving a request to perform one or more background tasks on a storage server of a storage service in a provider network, determining a priority of each of the one or more background tasks, wherein each background task is associated with a size parameter and a temporal parameter, and wherein the priority of each of the one or more background tasks is based at least on its associated size parameter and temporal parameter, determining a task type associated with each background task, adding each background task to one of a plurality of task queues associated with different task types, wherein each task queue is associated with a bandwidth allocation, and scheduling the one or more background tasks to be performed based on their priority and the bandwidth allocation.

    DATA STORAGE SYSTEM WITH METADATA CHECK-POINTING

    公开(公告)号:US20220057951A1

    公开(公告)日:2022-02-24

    申请号:US17520537

    申请日:2021-11-05

    Abstract: A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to volume data being flushed to the data storage sleds. Also, the primary and secondary head nodes store check-points of volume metadata to the data storage sleds, wherein in response to a failure of a primary or secondary head node for a volume partition, a replacement secondary head node for the volume partition recreates a secondary replica for the volume partition based, at least in part, on a stored volume metadata checkpoint.

    Systems and methods including committing a note to master and slave copies of a data volume based on sequential operation numbers

    公开(公告)号:US10802921B2

    公开(公告)日:2020-10-13

    申请号:US16259571

    申请日:2019-01-28

    Abstract: Systems and methods for provisioning a slave copy for redundant data storage and for writing data to persistent storage in a block-based storage system using sequential operation numbers are provided. In one embodiment, the method includes maintaining a master copy and a slave copy of a data volume, the master copy including data generated by a plurality of operations having respective sequential operation numbers, receiving a write instruction for second data to be added to the master copy, and recording the second data as a note that is not readable. The method may further include sending a copy of the note from the master copy to the slave copy, committing the note to the master copy with a sequential operation number, and committing the copy of the note to the slave copy based in part on the sequential operation number. A B-tree may be created based at least in part on an offset for a write instruction associated with the second data, a length, and an operation number included in the note.

    Optimized write performance at block-based storage during volume snapshot operations
    9.
    发明授权
    Optimized write performance at block-based storage during volume snapshot operations 有权
    在卷快照操作期间优化基于块的存储的写入性能

    公开(公告)号:US09405483B1

    公开(公告)日:2016-08-02

    申请号:US14205046

    申请日:2014-03-11

    Abstract: Write optimization for block-based storage performing snapshot operations may be implemented. Write requests for a particular data volume may be received for which a snapshot operation is in progress. A determination may be made as to whether a data chunk of the data volume modified as part of the write request has not yet been stored to a remote snapshot data store as part of the snapshot operation. For a data chunk that is to be modified and that has not yet been stored, the data chunk may be stored in a local in-memory volume snapshot buffer. Once the data chunk is stored in the in-memory volume snapshot buffer, the write request may be performed and acknowledged as complete. The data chunk may be sent to the remote snapshot data store asynchronously with regard to the acknowledgment of the write request.

    Abstract translation: 可以实现对执行快照操作的基于块的存储的写入优化。 可能会收到针对特定数据卷的写请求正在进行快照操作。 作为快照操作的一部分,可以确定作为写请求的一部分修改的数据卷的数据块是否还没有被存储到远程快照数据存储。 对于要修改并且尚未存储的数据块,数据块可以存储在本地内存卷快照缓冲区中。 一旦数据块被存储在内存卷快照缓冲器中,写请求可以被执行并被确认为完成。 关于写请求的确认,数据块可以异步地发送到远程快照数据存储。

    Data storage system
    10.
    发明授权

    公开(公告)号:US11301144B2

    公开(公告)日:2022-04-12

    申请号:US16457095

    申请日:2019-06-28

    Abstract: A data storage system includes multiple head nodes and data storage sleds. A control plane of the data storage system designates, for a volume partition, one of the head nodes to function as a primary head node storing a primary replica of the volume partition and designates two or more other head nodes to function as reserve head nodes storing reserve replicas of the volume partition. Additionally, the primary head node causes volume data for the volume partition to be erasure encoded and stored on multiple mass storage devices in different ones of the data storage sleds.

Patent Agency Ranking