DETERMINING AN OPTIMAL TIMEOUT VALUE TO MINIMIZE DOWNTIME FOR NODES IN A NETWORK-ACCESSIBLE SERVER SET

    公开(公告)号:US20190007278A1

    公开(公告)日:2019-01-03

    申请号:US15640114

    申请日:2017-06-30

    IPC分类号: H04L12/24

    摘要: Methods, systems, and computer program products are described herein for minimizing the downtime for nodes in a network-accessible server set. The downtime may be minimized by determining an optimal timeout value for which a fabric controller waits to perform a recovery action. The optimal timeout value may be determined for each cluster in the network-accessible server set. The optimal timeout value advantageously reduces the overall downtime for customer workloads running on a node for which contact has been lost. The optimal timeout value for each cluster may be based on a predictive model based on the observed historical patterns of the nodes within that cluster. In the event that an optimal timeout value is not determined for a particular cluster (e.g., due to a lack of observed historical patterns), the fabric controller may fall back to a less than optimal timeout value.