-
公开(公告)号:US10983879B1
公开(公告)日:2021-04-20
申请号:US16176188
申请日:2018-10-31
Applicant: EMC IP HOLDING COMPANY LLC
Inventor: Akash Agrawal , Timothy Johnson , Jiahui Wang , Peng Yin , Stephen Richard Ives , Michael Garvey , Christopher Monti
Abstract: A method of distributed management of recovery of multi-controller NVMe drives includes detecting a path failure of a PCIe path from a first storage node to a first controller on the multi-controller NVMe drive, and initially attempting to correct the path failure using a controller level reset. If the controller level reset is unsuccessful, an alternative path to the controller is sought, and if that is unsuccessful a drive level reset operation is coordinated by all storage nodes with controllers executing on the NVMe drive. To coordinate reset of the NVMe drive, one storage node is elected master. Each node (both slave and master) initiates quiescing of IO operations on its respective controller, and after quiescing has completed, initiates shutdown of its respective controller. Once all controllers are shut down, the master initiates reset of the NVMe drive. Timeouts are used to constrain completion of the quiescing and shutdown operations.