Resource management in a clustered computer system
摘要:
Methods, systems, and devices are provided for managing resources in a computing cluster. The managed resources include cluster nodes themselves, as well as sharable resources such as memory buffers and bandwidth credits that may be used by one or more nodes. Resource management includes detecting failures and possible failures by node software, node hardware, interconnects, and system area network switches and taking steps to compensate for failures and prevent problems such as uncoordinated access to a shared disk. Resource management also includes reallocating sharable resources in response to node failure, demands by application programs, or other events. Specific examples provided include failure detection by remote memory probes, emergency communication through a shared disk, and sharable resource allocation with minimal locking.
信息查询
0/0