-
公开(公告)号:US09600203B2
公开(公告)日:2017-03-21
申请号:US14204943
申请日:2014-03-11
Applicant: Amazon Technologies, Inc.
Inventor: Danny Wei , Kerry Quintin Lee , James Michael Thompson , John Luther Guthrie, II , Jianhua Fan , Nandakumar Gopalakrishnan
IPC: G06F3/06
CPC classification number: G06F17/30575 , G06F3/0604 , G06F3/0611 , G06F3/0623 , G06F3/064 , G06F3/065 , G06F3/0665 , G06F3/067 , G06F3/0683 , G06F11/2069 , H04L67/1095 , H04L67/1097
Abstract: A block-based storage system may implement reducing durability state for a data volume. A determination may be made that storage node replicating write requests for a data volume is unavailable. In response, subsequent write requests may be processed according to a reduced durability state for the data volume such that replication for the data volume may be disabled for the storage node. Write requests may then be completed at a fewer number of storage nodes prior to acknowledging the write request as complete. Durability state for the data volume may be increase in various embodiments. A storage node may be identified and replication operations may be performed to synchronize the current data volume at the storage node with a replica of the data volume maintained at the identified storage node.
-
公开(公告)号:US09436407B1
公开(公告)日:2016-09-06
申请号:US13860343
申请日:2013-04-10
Applicant: Amazon Technologies, Inc.
Inventor: Jianhua Fan , Kerry Quintin Lee , Danny Wei , Tate Andrew Certain
IPC: G06F3/06
CPC classification number: G06F3/065 , G06F3/0614 , G06F3/067
Abstract: Methods and systems for cursor remirroring are disclosed. A mirroring process is initiated for a plurality of chunks stored by a master node. The mirroring process comprises visiting a sequence of one or more of the chunks and, for at least some of the chunks, copying chunk data or metadata to a slave node. During the initiated mirroring process, a request is received for a write operation on one of the chunks stored by the master node. If the chunk in the request has been visited in the mirroring process, the write operation is performed on the master node and on the slave node. If the chunk in the request has not been visited, the write operation is performed on the master node and postponed on the slave node until the chunk in the request has been visited in the mirroring process.
Abstract translation: 公开了用于光标重新镜像的方法和系统。 为由主节点存储的多个块启动镜像处理。 镜像过程包括访问一个或多个块的序列,并且对于至少一些块,将块数据或元数据复制到从节点。 在启动的镜像处理期间,接收到由主节点存储的块之一上的写入操作的请求。 如果在镜像过程中访问了请求中的块,则在主节点和从节点上执行写操作。 如果请求中的块没有被访问,则在主节点上执行写入操作,并在从节点上延迟,直到在镜像过程中访问了请求中的块。
-
公开(公告)号:US11941278B2
公开(公告)日:2024-03-26
申请号:US17520537
申请日:2021-11-05
Applicant: Amazon Technologies, Inc.
Inventor: Norbert Paul Kusters , Jianhua Fan , Shuvabrata Ganguly , Danny Wei , Avram Israel Blaszka
CPC classification number: G06F3/0644 , G06F3/0617 , G06F3/0631 , G06F3/065 , G06F3/067 , G06F11/1612 , G06F11/3034
Abstract: A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to volume data being flushed to the data storage sleds. Also, the primary and secondary head nodes store check-points of volume metadata to the data storage sleds, wherein in response to a failure of a primary or secondary head node for a volume partition, a replacement secondary head node for the volume partition recreates a secondary replica for the volume partition based, at least in part, on a stored volume metadata checkpoint.
-
公开(公告)号:US11681443B1
公开(公告)日:2023-06-20
申请号:US17006502
申请日:2020-08-28
Applicant: Amazon Technologies, Inc.
Inventor: Sriram Venugopal , Kun Tang , Norbert Paul Kusters , Jianhua Fan
CPC classification number: G06F3/0619 , G06F3/0604 , G06F3/0641 , G06F3/0644 , G06F3/0652 , G06F3/0683 , G06F11/10 , G06F2201/84
Abstract: A data storage system includes a head node and mass storage devices. The head node is configured to store volume data and flush volume data to the mass storage devices. Additionally, the head node is configured to determine a quantity of data partitions and/or parity partitions to store for a chunk of volume data being flushed to the mass storage devices in order to satisfy a durability guarantee. For chunks of data for which complete copies are also stored in an additional data storage system, the head node is configured to reduce the quantity of data partitions and/or parity partitions stored such that required storage space is reduced while still ensuring that the durability guarantee is satisfied.
-
公开(公告)号:US11461156B2
公开(公告)日:2022-10-04
申请号:US17239440
申请日:2021-04-23
Applicant: Amazon Technologies, Inc.
Inventor: Fan Ping , Andrew Boyer , Oleksandr Chychykalo , James Pinkerton , Danny Wei , Norbert Paul Kusters , Divya Ashok Kumar Jain , Jianhua Fan , Thomas Tarak Mathew Veppumthara , Sebastiano Peluso
Abstract: A block-based storage system hosts logical volumes that are implemented via multiple replicas of volume data stored on multiple resource hosts in different failure domains. Also, the block-based storage service allows multiple client computing devices to attach to a same given logical volume at the same time. In order to prevent unnecessary failovers, a primary node storing a primary replica is configured with a health check application programmatic interface (API) and a secondary node storing a secondary replica determines whether or not to initiate a failover based on the health of the primary replica.
-
公开(公告)号:US20210240560A1
公开(公告)日:2021-08-05
申请号:US17239440
申请日:2021-04-23
Applicant: Amazon Technologies, Inc.
Inventor: Fan Ping , Andrew Boyer , Oleksandr Chychykalo , James Pinkerton , Danny Wei , Norbert Paul Kusters , Divya Ashok Kumar Jain , Jianhua Fan , Thomas Tarak Mathew Veppumthara , Sebastiano Peluso
Abstract: A block-based storage system hosts logical volumes that are implemented via multiple replicas of volume data stored on multiple resource hosts in different failure domains. Also, the block-based storage service allows multiple client computing devices to attach to a same given logical volume at the same time. In order to prevent unnecessary failovers, a primary node storing a primary replica is configured with a health check application programmatic interface (API) and a secondary node storing a secondary replica determines whether or not to initiate a failover based on the health of the primary replica.
-
公开(公告)号:US10990464B1
公开(公告)日:2021-04-27
申请号:US16560859
申请日:2019-09-04
Applicant: Amazon Technologies, Inc.
Inventor: Fan Ping , Andrew Boyer , Oleksandr Chychykalo , James Pinkerton , Danny Wei , Norbert Paul Kusters , Divya Ashok Kumar Jain , Jianhua Fan , Thomas Tarak Mathew Veppumthara , Sebastiano Peluso
Abstract: A block-based storage system hosts logical volumes that are implemented via multiple replicas of volume data stored on multiple resource hosts in different failure domains. Also, the block-based storage service allows multiple client computing devices to attach to a same given logical volume at the same time. In order to prevent unnecessary failovers, a primary node storing a primary replica is configured with a health check application programmatic interface (API) and a secondary node storing a secondary replica determines whether or not to initiate a failover based on the health of the primary replica.
-
公开(公告)号:US20190155694A1
公开(公告)日:2019-05-23
申请号:US16259571
申请日:2019-01-28
Applicant: Amazon Technologies, Inc.
Inventor: Jianhua Fan , Benjamin Arthur Hawks , Norbert Paul Kusters , Nachiappan Arumugam , Danny Wei , John Luther Guthrie
CPC classification number: G06F11/1448 , G06F3/0605 , G06F3/0619 , G06F3/065 , G06F3/067 , G06F3/0689 , G06F11/1446 , G06F11/1464 , G06F11/1471 , G06F2201/84
Abstract: The present disclosure provides persistent storage for a master copy using operation numbers. A master copy can include a B-tree with references to corresponding data. When provisioning a slave copy, the master copy sends a point-in-time copy of the B-tree to the slave copy, which stores a copy of the B-tree, allocates the necessary space, and updates the references of the B-tree to point to a local storage before the data is transferred. When writing the data to persistent storage, a snapshot created on the master copy is an operation that is replicated to the slave copy. The snapshot is generated using a volume view that includes changes to chunks of data of the master copy since a previous snapshot, as determined using the operation number for the previous snapshot. Data (and metadata) for the snapshot is written to persistent storage while new I/O operations are processed.
-
公开(公告)号:US09983825B2
公开(公告)日:2018-05-29
申请号:US15665063
申请日:2017-07-31
Applicant: Amazon Technologies, Inc.
Inventor: Danny Wei , Kerry Quintin Lee , John Luther Guthrie, II , Jianhua Fan , James Michael Thompson , Nandakumar Gopalakrishnan
IPC: G06F3/06
CPC classification number: G06F3/065 , G06F3/0614 , G06F3/0617 , G06F3/067 , G06F3/0683
Abstract: A block-based storage system may implement efficient replication for restoring a data volume from a reduced durability state. A storage node that is not replicating write requests for a data volume may determine that replication for the data volume is to be enabled. A peer storage node may be identified that maintains a stale replica of the data volume. One or more replication operations may be performed to update stale data chunks in the stale replica of the data volume with current data chunks without updating data chunks in the stale replica of the data volume that are current. Stale replicas that are no longer needed may be deleted according timeouts or the amount of stale data in the replica.
-
公开(公告)号:US20170351462A1
公开(公告)日:2017-12-07
申请号:US15673271
申请日:2017-08-09
Applicant: Amazon Technologies, Inc.
Inventor: Jianhua Fan , Benjamin Arthur Hawks , Norbert Paul Kusters , Nachiappan Arumugam , Danny Wei , John Luther Guthrie, II
IPC: G06F3/06
CPC classification number: G06F3/0665 , G06F3/0619 , G06F3/065 , G06F3/067 , G06F11/1435
Abstract: A slave storage is provisioned using metadata of a master B-tree and updates to references (e.g., offsets) pertaining to data operations of the master B-tree. Master-slave pairs can be used to provide data redundancy, and a master copy can include the master B-tree with references to corresponding data. When provisioning a slave copy, the master sends a B-tree copy to the slave, which stores the slave B-tree copy, allocates the necessary space on local storage, and updates respective offsets of the slave B-tree copy to point to the local storage. Data from the master can then be transferred to the slave and stored according to a note and commit process that ensures operational sequence of the data. Operations received to the master during the process can be committed to the slave copy until the slave is consistent with the master and able to take over as master in the event of a failure.
-
-
-
-
-
-
-
-
-