Method and apparatus for enhancing input/output error analysis in hardware sub-systems
    1.
    发明申请
    Method and apparatus for enhancing input/output error analysis in hardware sub-systems 有权
    用于增强硬件子系统中输入/输出误差分析的方法和装置

    公开(公告)号:US20030172323A1

    公开(公告)日:2003-09-11

    申请号:US10093436

    申请日:2002-03-07

    IPC分类号: H04L001/22

    CPC分类号: H04L1/22

    摘要: A method, apparatus, and computer instructions for processing errors in a hierarchical hardware sub-system in the data processing system in which the hierarchical hardware sub-system includes a host processor bridge having a mapping registers section and a control and status registers section. In response to detecting an error freezing the mapping registers section in the host bridge, a component within the hierarchical hardware sub-system connected to the host bridge is identified to form a selected component. An address is written to a register within the control and status registers section of the host bridge in which the address is to an error register in the component. Data is read in response to a result from the address written in the register being placed in the control and status registers portion of the host bridge.

    摘要翻译: 一种用于处理数据处理系统中的分层硬件子系统中的错误的方法,装置和计算机指令,其中分层硬件子系统包括具有映射寄存器部分和控制和状态寄存器部分的主机处理器桥。 响应于检测到主桥中的映射寄存器部分的冻结错误,识别连接到主机桥的分级硬件子系统内的组件以形成所选择的组件。 地址被写入主机桥的控制和状态寄存器部分中的寄存器,其中地址是组件中的错误寄存器。 响应于写入寄存器中的地址的结果读取数据,寄存器被放置在主桥的控制和状态寄存器部分中。

    Critical datapath error handling in a multiprocessor architecture
    4.
    发明申请
    Critical datapath error handling in a multiprocessor architecture 失效
    多处理器架构中的关键数据路径错误处理

    公开(公告)号:US20030182351A1

    公开(公告)日:2003-09-25

    申请号:US10105125

    申请日:2002-03-21

    IPC分类号: G06F009/00

    摘要: A interrupt is generated for all processors in a multiprocessor system when a critical datapath experiences an error. Serialization code in the interrupt handling routine for that interrupt suspends all processors except one and places the suspended processors in a waiting queue while the one processor handles the error. After the error has been handled, the remaining processors are allow to execute the interrupt handler, which simply exits detecting no error.

    摘要翻译: 当关键数据路径遇到错误时,会在多处理器系统中为所有处理器生成中断。 该中断的中断处理例程中的序列化代码会挂起除1个以外的所有处理器,并在一个处理器处理错误时将挂起的处理器置于等待队列中。 在处理错误之后,剩下的处理器允许执行中断处理程序,它只是退出检测没有错误。

    Method and apparatus for analyzing hardware errors in a logical partitioned data processing system
    5.
    发明申请
    Method and apparatus for analyzing hardware errors in a logical partitioned data processing system 失效
    用于分析逻辑分区数据处理系统中的硬件错误的方法和装置

    公开(公告)号:US20030172322A1

    公开(公告)日:2003-09-11

    申请号:US10093433

    申请日:2002-03-07

    IPC分类号: H04B001/74

    摘要: A method, apparatus, and computer instructions for processing errors in a hierarchical input/output sub-system having an input/output bridge with a plurality of hardware devices in a level below the bridge. A value is read from a selected register to form a read value in response to detecting an error. The selected register is reset. Each bit in the read value associated with the error is cleared to form a cleared value. The cleared value is written into the selected register such that errors occurring since the register was cleared are preserved. The error registers below the bridge are scanned in response to an absence of an error being detected in a bridge within the input/output sub-system. A determination is made as to whether the error has previously occurred in response to a presence of an error is; being found by scanning the registers below the bridge. The error is reported in response to an absence of a determination that the error has previously occurred.

    摘要翻译: 一种用于处理分级输入/输出子系统中的错误的方法,装置和计算机指令,其具有具有在桥下的级别中的多个硬件设备的输入/输出桥。 响应于检测到错误,从所选择的寄存器读取值以形成读取值。 所选寄存器复位。 与错误相关联的读取值中的每个位都将被清除以形成一个清除的值。 清除的值被写入所选择的寄存器,从而保留从寄存器清除以来发生的错误。 响应于在输入/输出子系统中的桥中没有检测到错误,扫描桥下的错误寄存器。 作出响应于是否存在错误是否已经发生错误的确定; 通过扫描桥下的寄存器来找到。 响应于先前没有发生错误的确定而报告错误。

    Handling multiple operating system capabilities in a logical partition data processing system
    6.
    发明申请
    Handling multiple operating system capabilities in a logical partition data processing system 有权
    在逻辑分区数据处理系统中处理多个操作系统功能

    公开(公告)号:US20030204780A1

    公开(公告)日:2003-10-30

    申请号:US10132136

    申请日:2002-04-25

    IPC分类号: G06F011/07

    摘要: A method, computer program product, and data processing system for handling errors or other events in a logical partition (LPAR) data processing system is disclosed. When an operating system is initialized in a logical partition, it registers its capabilities for handling particular errors or other events with management software. When an error or other event affecting that logical partition occurs, the management software checks to see if the particular error or event is one that the operating system is capable of handling. If so, the operating system is notified. Otherwise, the management software directs the operating system to take other appropriate action, such as termination of the operating system and/or partition.

    摘要翻译: 公开了一种用于处理逻辑分区(LPAR)数据处理系统中的错误或其他事件的方法,计算机程序产品和数据处理系统。 当操作系统在逻辑分区中初始化时,它会通过管理软件注册其处理特定错误或其他事件的功能。 当发生影响该逻辑分区的错误或其他事件时,管理软件将检查特定错误或事件是否是操作系统能够处理的错误或事件。 如果是这样,则会通知操作系统。 否则,管理软件指示操作系统采取其他适当的操作,例如终止操作系统和/或分区。

    Multiple fault location in a series of devices
    7.
    发明申请
    Multiple fault location in a series of devices 审中-公开
    一系列设备中的多故障定位

    公开(公告)号:US20030191978A1

    公开(公告)日:2003-10-09

    申请号:US10116522

    申请日:2002-04-04

    IPC分类号: G06F011/07 G06F011/273

    摘要: A method, computer program product, and data processing system for locating hardware faults occurring in multiple devices in a data processing system is disclosed. The devices have a scanning order in which the devices (or at least information regarding the devices) are scanned to analyze any possible error condition. When a new error is detected in a device, an identification of the device is stored in a data structure. If another error is detected and causes the devices to be scanned again, the scanning process will skip over the device whose identity is stored in the data structure so that the new error can be located.

    摘要翻译: 公开了一种用于定位发生在数据处理系统中的多个设备中的硬件故障的方法,计算机程序产品和数据处理系统。 这些设备具有扫描顺序,其中扫描设备(或至少关于设备的信息)以分析任何可能的错误状况。 当在设备中检测到新的错误时,设备的标识被存储在数据结构中。 如果检测到另一个错误并导致再次扫描设备,则扫描过程将跳过其身份存储在数据结构中的设备,以便可以找到新的错误。

    System, method, and computer program product for preventing machine crashes due to hard errors in logically partitioned systems
    8.
    发明申请
    System, method, and computer program product for preventing machine crashes due to hard errors in logically partitioned systems 有权
    系统,方法和计算机程序产品,用于防止由于逻辑分区系统中的硬错误引起的机器故障

    公开(公告)号:US20030131279A1

    公开(公告)日:2003-07-10

    申请号:US10045280

    申请日:2002-01-10

    IPC分类号: G06F011/267 G06F011/30

    CPC分类号: G06F11/004 G06F11/142

    摘要: A system, method, and computer program product are disclosed for preventing machine crashes due to hard errors in one of multiple, different processors that are included in a logically partitioned data processing system. An error occurring in one of the processors is detected. A determination is then made regarding whether the processor has been deconfigured. The partition is then rebooted only in response to a determination that the processor has been deconfigured and will not be included in the partition processor resources. Thus, only the configured processors are rebooted. The deconfigured processor is not rebooted.

    摘要翻译: 公开了一种系统,方法和计算机程序产品,用于防止由逻辑分区的数据处理系统中包括的多个不同处理器之一的硬错误引起的机器故障。 检测到在一个处理器中发生的错误。 然后确定处理器是否被解除配置。 然后仅在响应于确定处理器被解配置并且将不包括在分区处理器资源中时重新启动该分区。 因此,只有配置的处理器重新启动。 解除配置的处理器未重新启动。

    Method and apparatus for reporting errors in a data processing system
    9.
    发明申请
    Method and apparatus for reporting errors in a data processing system 有权
    用于报告数据处理系统中的错误的方法和装置

    公开(公告)号:US20040205393A1

    公开(公告)日:2004-10-14

    申请号:US10411464

    申请日:2003-04-10

    IPC分类号: H04L001/22

    摘要: A method, apparatus, and computer instructions for reporting errors occurring in a data processing system. Responsive to an error occurring in a host bridge in the data processing system, a determination is made as to whether a device required for generating an error report is located below the host bridge. Responsive to the device required for generating an error report being located below a host bridge, the host bridge is isolated from other portions of the data processing system, wherein only a processor analyzing the error is able to access the host bridge. An error reporting process is performed. The error reporting process is able to access the host bridge and the device.

    摘要翻译: 一种用于报告在数据处理系统中发生错误的方法,装置和计算机指令。 响应于在数据处理系统中的主机桥中发生的错误,确定生成错误报告所需的设备是否位于主桥下方。 响应于生成位于主机桥下方的错误报告所需的设备,主桥与数据处理系统的其他部分隔离,其中只有分析错误的处理器能够访问主机桥。 执行错误报告过程。 错误报告过程能够访问主机桥和设备。