摘要:
A method and computer program product for error monitoring partitions in a computer system. Provided to each partition is a partition status indicator (PSI) denoting a RUNNING or FAIL status of the partition, and an error log area (ELA) for storing partition error entries. The ELA includes a partition identifier, an entry status indicator (ESI) indicating READ/UNREAD status for the error entry, and an error identifier. An error procedure performed for each first partition whose partition status indicator indicates the FAIL status includes: copying each error entry in the ELA of the first partition whose ESI indicates the UNREAD status into the ELA of a second (running) partition; setting the ESI to the READ status for each copied error entry in the ELA of the first partition; and having the ESI set to the UNREAD status for each copied error entry in the ELA of the second partition.
摘要:
A method and computer program product for error monitoring partitions in a computer system. A global supervisor mapping (GSM) associates each supervised partition with a supervisor partition that monitors the supervised partition. A partition status buffer (PSB) denotes a status (GOOD, BAD, NOCARE) of the partition. The BAD status denotes that the partition has encountered at least one error that is currently unrepaired. The supervisor partition determines its supervised partition from the GSM and ascertains the status of its supervised partition from the PSB. If the status of the supervised partition is BAD then a recovery procedure is performed by the supervisor partition. The recovery procedure: obtains a grant of access to physical and logical resources of the supervised partition which contains error data of the supervised partition; gathers