摘要:
A computer system implementing a fault detection and isolation technique tracks failed physical devices by error codes embedded in various component in the computer system. The computer system comprises one or more CPU's, one or more memory modules, a master control device, such as an I2C master, and a North bridge logic device coupling together the CPU's, memory modules, and master control device. The master control device also connects to the CPU's and memory modules over a serial bus, such as an I2C bus. Each component includes a nonvolatile memory coupled to the I2C bus for storing error information. If a component fails, a CPU stores an error code into the nonvolatile memory via the I2C bus. During initialization, the CPU creates a logical resource map which includes a list of logical addresses of all available (i.e., fully functional) devices. The logical resource map is provided to the computer's operating system which isolates failed devices by only permitting access to those logical devices listed as available. The computer may include a non-volatile memory device coupled to the CPU for storing a failed device log which includes a list of ID codes corresponding to failed physical devices. After a device is determined to be non-functional, one of the CPU's stores that device's unique ID code in the failed device log. During system initialization, the information in the failed device log is compared to the error information stored in the components to create the logical resource map.
摘要:
A computer system implementing a fault detection and isolation technique that tracks failed physical devices by identification (ID) codes embedded in each component of the computer for which the ability to detect faults and isolate failed devices is disclosed. The computer system comprises one or more CPU's, one or more memory modules, a master control device, such as an I2C master, and a North bridge logic device coupling together the CPU's, memory modules, and master control device. The master control device also connects to the CPU's and memory modules over a serial bus, such as an I2C bus. Each CPU and memory module includes an ID code that uniquely identifies and distinguishes that device from all other devices in the computer system. The computer also includes a non-volatile memory device coupled to the CPU for storing a failed device log which includes a list of ID codes corresponding to failed physical devices. After a device is determined to be non-functional, one of the CPU's stores that device's unique ID code in the failed device log. Using the list of physical devices from the failed device log, the CPU creates a logical resource map which includes a list of logical addresses of all available (i.e., fully functional) devices. The logical resource map is provided to the computer's operating system which isolates failed devices by only permitting access to those logical devices listed as available in the logical resource map.