System and method for managing recovery of multi-controller NVMe drives

    公开(公告)号:US10983879B1

    公开(公告)日:2021-04-20

    申请号:US16176188

    申请日:2018-10-31

    Abstract: A method of distributed management of recovery of multi-controller NVMe drives includes detecting a path failure of a PCIe path from a first storage node to a first controller on the multi-controller NVMe drive, and initially attempting to correct the path failure using a controller level reset. If the controller level reset is unsuccessful, an alternative path to the controller is sought, and if that is unsuccessful a drive level reset operation is coordinated by all storage nodes with controllers executing on the NVMe drive. To coordinate reset of the NVMe drive, one storage node is elected master. Each node (both slave and master) initiates quiescing of IO operations on its respective controller, and after quiescing has completed, initiates shutdown of its respective controller. Once all controllers are shut down, the master initiates reset of the NVMe drive. Timeouts are used to constrain completion of the quiescing and shutdown operations.

Patent Agency Ranking