摘要:
A method, apparatus, and computer instructions for processing trace data in a logical partitioned data processing system. A partition causing an exception is identified in response to detecting the exception. The partition is one within a set of partitions in the logical partitioned data processing system. The trace data for the identified partition is stored in an error log or other data structure for a machine check interrupt handler.
摘要:
A method, apparatus, and computer instructions for processing trace data in a logical partitioned data processing system. A partition causing an exception is identified in response to detecting the exception. The partition is one within a set of partitions in the logical partitioned data processing system. The trace data for the identified partition is stored in an error log or other data structure for a machine check interrupt handler.
摘要:
A method, apparatus, and computer instructions for preserving trace data in a logical partitioned data processing system. A call is received from a partition in a plurality of partitions to register a buffer in the partition for the trace data. The call includes a pointer to the buffer. The buffer is associated with a trace routine in platform firmware. The trace routine stores the trace data for calls made by the partition to the platform firmware in the buffer.
摘要:
A method, apparatus, and computer instructions for reporting errors occurring in a data processing system. Responsive to an error occurring in a host bridge in the data processing system, a determination is made as to whether a device required for generating an error report is located below the host bridge. Responsive to the device required for generating an error report being located below a host bridge, the host bridge is isolated from other portions of the data processing system, wherein only a processor analyzing the error is able to access the host bridge. An error reporting process is performed. The error reporting process is able to access the host bridge and the device.
摘要:
A method, apparatus, and computer instructions for processing errors in a hierarchical input/output sub-system having an input/output bridge with a plurality of hardware devices in a level below the bridge. A value is read from a selected register to form a read value in response to detecting an error. The selected register is reset. Each bit in the read value associated with the error is cleared to form a cleared value. The cleared value is written into the selected register such that errors occurring since the register was cleared are preserved. The error registers below the bridge are scanned in response to an absence of an error being detected in a bridge within the input/output sub-system. A determination is made as to whether the error has previously occurred in response to a presence of an error being found by scanning the registers below the bridge. The error is reported in response to an absence of a determination that the error has previously occurred.
摘要:
A system, method, and computer program product are disclosed for preventing machine crashes due to hard errors in one of multiple, different processors that are included in a logically partitioned data processing system. An error occurring in one of the processors is detected. A determination is then made regarding whether the processor has been deconfigured. The partition is then rebooted only in response to a determination that the processor has been deconfigured and will not be included in the partition processor resources. Thus, only the configured processors are rebooted. The deconfigured processor is not rebooted.
摘要:
A system, method, and product in a logically partitioned data processing system are disclosed for preserving trace data after a partition crash. The logically partitioned data processing system includes multiple, different processors. An error is encountered in one of the processors. Data associated with the error is stored in a trace buffer. Contents of the trace buffer are stored prior to the data being overwritten.
摘要:
A method, apparatus, and computer instructions in a logical partitioned data processing system for managing trace data. A call is received for the trace data from a calling partition within a plurality of partitions in the logical partitioned data processing system. The trace data in a buffer associated with the calling partition to form identified trace data is identified. Only the identified trace data for the calling partition is returned. The trace data for other partitions within the plurality of partitions is not returned to the calling partition.
摘要:
A interrupt is generated for all processors in a multiprocessor system when a critical datapath experiences an error. Serialization code in the interrupt handling routine for that interrupt suspends all processors except one and places the suspended processors in a waiting queue while the one processor handles the error. After the error has been handled, the remaining processors are allow to execute the interrupt handler, which simply exits detecting no error.
摘要:
A method, apparatus, and computer instructions for halting input/output error propagation in the logically partitioned data processing system. All components associated with the bridge are identified to form a set of failed components in response to detecting an error state in a bridge within a set of bridges in the logical partitioned data processing system. An identification of the failed components is stored in which the identification is used by each partition during a boot process.