Autonomic recovery from hardware errors in an input/output fabric

    公开(公告)号:US07134052B2

    公开(公告)日:2006-11-07

    申请号:US10438392

    申请日:2003-05-15

    IPC分类号: G06F11/00

    摘要: An apparatus, program product and method propagate errors detected in an IO fabric element from an IO fabric that is used to couple a plurality of endpoint IO resources to processing elements in a computer. In particular, such errors are propagated to the endpoint IO resources affected by the IO fabric element in connection with recovering from the errors in the IO fabric element. By doing so, a device driver or other program code used to access each affected IO resources may be permitted to asynchronously recover from the propagated error in its associated IO resource, and often without requiring the recovery from the error in the IO fabric element to wait for recovery to be completed for each of the affected IO resources. In addition, an IO fabric may be dynamically configured to support both recoverable and non-recoverable endpoint IO resources. In particular, IO fabric elements within an IO fabric may be dynamically configured to enable machine check signaling in such IO fabric elements in response to detection that an endpoint IO resource is non-recoverable in nature. The IO fabric elements that are dynamically configured as such are disposed within a hardware path that is defined between the non-recoverable resource and a processor that accesses the non-recoverable resource.

    Autonomic recovery from hardware errors in an input/output fabric
    2.
    发明授权
    Autonomic recovery from hardware errors in an input/output fabric 有权
    从输入/输出结构中的硬件错误自动恢复

    公开(公告)号:US07549090B2

    公开(公告)日:2009-06-16

    申请号:US11466290

    申请日:2006-08-22

    IPC分类号: G06F11/00

    摘要: An apparatus, program product and method propagate errors detected in an IO fabric element from an IO fabric that is used to couple a plurality of endpoint IO resources to processing elements in a computer. In particular, such errors are propagated to the endpoint IO resources affected by the IO fabric element in connection with recovering from the errors in the IO fabric element. By doing so, a device driver or other program code used to access each affected IO resources may be permitted to asynchronously recover from the propagated error in its associated IO resource, and often without requiring the recovery from the error in the IO fabric element to wait for recovery to be completed for each of the affected IO resources. In addition, an IO fabric may be dynamically configured to support both recoverable and non-recoverable endpoint IO resources. In particular, IO fabric elements within an IO fabric may be dynamically configured to enable machine check signaling in such IO fabric elements in response to detection that an endpoint IO resource is non-recoverable in nature. The IO fabric elements that are dynamically configured as such are disposed within a hardware path that is defined between the non-recoverable resource and a processor that accesses the non-recoverable resource.

    摘要翻译: 装置,程序产品和方法将用于将多个端点IO资源耦合到计算机中的处理元件的IO架构在IO结构元素中检测到的错误传播。 特别地,这些错误被传播到由IO结构元素影响的端点IO资源以及从IO结构元素中的错误的恢复。 通过这样做,可以允许用于访问每个受影响的IO资源的设备驱动程序或其他程序代码从其关联的IO资源中的传播错误异步恢复,并且通常不需要从IO架构元素中的错误中恢复以等待 以便为每个受影响的IO资源完成恢复。 此外,IO结构可以动态配置为支持可恢复和不可恢复的端点IO资源。 特别地,IO结构中的IO结构元素可以被动态地配置成使得在这种IO结构元素中的机器检查信令能够响应于端点IO资源在本质上是不可恢复的检测。 被动态地配置的IO结构元素被布置在在不可恢复资源和访问不可恢复资源的处理器之间定义的硬件路径中。