Handling multiple operating system capabilities in a logical partition data processing system
    1.
    发明申请
    Handling multiple operating system capabilities in a logical partition data processing system 有权
    在逻辑分区数据处理系统中处理多个操作系统功能

    公开(公告)号:US20030204780A1

    公开(公告)日:2003-10-30

    申请号:US10132136

    申请日:2002-04-25

    IPC分类号: G06F011/07

    摘要: A method, computer program product, and data processing system for handling errors or other events in a logical partition (LPAR) data processing system is disclosed. When an operating system is initialized in a logical partition, it registers its capabilities for handling particular errors or other events with management software. When an error or other event affecting that logical partition occurs, the management software checks to see if the particular error or event is one that the operating system is capable of handling. If so, the operating system is notified. Otherwise, the management software directs the operating system to take other appropriate action, such as termination of the operating system and/or partition.

    摘要翻译: 公开了一种用于处理逻辑分区(LPAR)数据处理系统中的错误或其他事件的方法,计算机程序产品和数据处理系统。 当操作系统在逻辑分区中初始化时,它会通过管理软件注册其处理特定错误或其他事件的功能。 当发生影响该逻辑分区的错误或其他事件时,管理软件将检查特定错误或事件是否是操作系统能够处理的错误或事件。 如果是这样,则会通知操作系统。 否则,管理软件指示操作系统采取其他适当的操作,例如终止操作系统和/或分区。

    Multiple fault location in a series of devices
    2.
    发明申请
    Multiple fault location in a series of devices 审中-公开
    一系列设备中的多故障定位

    公开(公告)号:US20030191978A1

    公开(公告)日:2003-10-09

    申请号:US10116522

    申请日:2002-04-04

    IPC分类号: G06F011/07 G06F011/273

    摘要: A method, computer program product, and data processing system for locating hardware faults occurring in multiple devices in a data processing system is disclosed. The devices have a scanning order in which the devices (or at least information regarding the devices) are scanned to analyze any possible error condition. When a new error is detected in a device, an identification of the device is stored in a data structure. If another error is detected and causes the devices to be scanned again, the scanning process will skip over the device whose identity is stored in the data structure so that the new error can be located.

    摘要翻译: 公开了一种用于定位发生在数据处理系统中的多个设备中的硬件故障的方法,计算机程序产品和数据处理系统。 这些设备具有扫描顺序,其中扫描设备(或至少关于设备的信息)以分析任何可能的错误状况。 当在设备中检测到新的错误时,设备的标识被存储在数据结构中。 如果检测到另一个错误并导致再次扫描设备,则扫描过程将跳过其身份存储在数据结构中的设备,以便可以找到新的错误。

    System, method, and computer program product for preventing machine crashes due to hard errors in logically partitioned systems
    3.
    发明申请
    System, method, and computer program product for preventing machine crashes due to hard errors in logically partitioned systems 有权
    系统,方法和计算机程序产品,用于防止由于逻辑分区系统中的硬错误引起的机器故障

    公开(公告)号:US20030131279A1

    公开(公告)日:2003-07-10

    申请号:US10045280

    申请日:2002-01-10

    IPC分类号: G06F011/267 G06F011/30

    CPC分类号: G06F11/004 G06F11/142

    摘要: A system, method, and computer program product are disclosed for preventing machine crashes due to hard errors in one of multiple, different processors that are included in a logically partitioned data processing system. An error occurring in one of the processors is detected. A determination is then made regarding whether the processor has been deconfigured. The partition is then rebooted only in response to a determination that the processor has been deconfigured and will not be included in the partition processor resources. Thus, only the configured processors are rebooted. The deconfigured processor is not rebooted.

    摘要翻译: 公开了一种系统,方法和计算机程序产品,用于防止由逻辑分区的数据处理系统中包括的多个不同处理器之一的硬错误引起的机器故障。 检测到在一个处理器中发生的错误。 然后确定处理器是否被解除配置。 然后仅在响应于确定处理器被解配置并且将不包括在分区处理器资源中时重新启动该分区。 因此,只有配置的处理器重新启动。 解除配置的处理器未重新启动。

    Autonomic recovery from hardware errors in an input/output fabric
    4.
    发明申请
    Autonomic recovery from hardware errors in an input/output fabric 有权
    从输入/输出结构中的硬件错误自动恢复

    公开(公告)号:US20040230861A1

    公开(公告)日:2004-11-18

    申请号:US10438392

    申请日:2003-05-15

    IPC分类号: H02H003/05

    摘要: An apparatus, program product and method propagate errors detected in an IO fabric element from an IO fabric that is used to couple a plurality of endpoint IO resources to processing elements in a computer. In particular, such errors are propagated to the endpoint IO resources affected by the IO fabric element in connection with recovering from the errors in the IO fabric element. By doing so, a device driver or other program code used to access each affected IO resources may be permitted to asynchronously recover from the propagated error in its associated IO resource, and often without requiring the recovery from the error in the IO fabric element to wait for recovery to be completed for each of the affected IO resources. In addition, an IO fabric may be dynamically configured to support both recoverable and non-recoverable endpoint IO resources. In particular, IO fabric elements within an IO fabric may be dynamically configured to enable machine check signaling in such IO fabric elements in response to detection that an endpoint IO resource is non-recoverable in nature. The IO fabric elements that are dynamically configured as such are disposed within a hardware path that is defined between the non-recoverable resource and a processor that accesses the non-recoverable resource.

    摘要翻译: 装置,程序产品和方法将用于将多个端点IO资源耦合到计算机中的处理元件的IO架构在IO结构元素中检测到的错误传播。 特别地,这些错误被传播到由IO结构元素影响的端点IO资源以及从IO结构元素中的错误的恢复。 通过这样做,可以允许用于访问每个受影响的IO资源的设备驱动程序或其他程序代码从其关联的IO资源中的传播错误异步恢复,并且通常不需要从IO架构元素中的错误中恢复以等待 以便为每个受影响的IO资源完成恢复。 此外,IO结构可以动态配置为支持可恢复和不可恢复的端点IO资源。 特别地,IO结构中的IO结构元素可被动态地配置成使得在这种IO结构元素中的机器检查信令能够响应于端点IO资源在本质上是不可恢复的检测。 被动态地配置的IO结构元素被布置在在不可恢复资源和访问不可恢复资源的处理器之间定义的硬件路径中。

    Critical datapath error handling in a multiprocessor architecture
    7.
    发明申请
    Critical datapath error handling in a multiprocessor architecture 失效
    多处理器架构中的关键数据路径错误处理

    公开(公告)号:US20030182351A1

    公开(公告)日:2003-09-25

    申请号:US10105125

    申请日:2002-03-21

    IPC分类号: G06F009/00

    摘要: A interrupt is generated for all processors in a multiprocessor system when a critical datapath experiences an error. Serialization code in the interrupt handling routine for that interrupt suspends all processors except one and places the suspended processors in a waiting queue while the one processor handles the error. After the error has been handled, the remaining processors are allow to execute the interrupt handler, which simply exits detecting no error.

    摘要翻译: 当关键数据路径遇到错误时,会在多处理器系统中为所有处理器生成中断。 该中断的中断处理例程中的序列化代码会挂起除1个以外的所有处理器,并在一个处理器处理错误时将挂起的处理器置于等待队列中。 在处理错误之后,剩下的处理器允许执行中断处理程序,它只是退出检测没有错误。

    Method and apparatus for analyzing hardware errors in a logical partitioned data processing system
    8.
    发明申请
    Method and apparatus for analyzing hardware errors in a logical partitioned data processing system 失效
    用于分析逻辑分区数据处理系统中的硬件错误的方法和装置

    公开(公告)号:US20030172322A1

    公开(公告)日:2003-09-11

    申请号:US10093433

    申请日:2002-03-07

    IPC分类号: H04B001/74

    摘要: A method, apparatus, and computer instructions for processing errors in a hierarchical input/output sub-system having an input/output bridge with a plurality of hardware devices in a level below the bridge. A value is read from a selected register to form a read value in response to detecting an error. The selected register is reset. Each bit in the read value associated with the error is cleared to form a cleared value. The cleared value is written into the selected register such that errors occurring since the register was cleared are preserved. The error registers below the bridge are scanned in response to an absence of an error being detected in a bridge within the input/output sub-system. A determination is made as to whether the error has previously occurred in response to a presence of an error is; being found by scanning the registers below the bridge. The error is reported in response to an absence of a determination that the error has previously occurred.

    摘要翻译: 一种用于处理分级输入/输出子系统中的错误的方法,装置和计算机指令,其具有具有在桥下的级别中的多个硬件设备的输入/输出桥。 响应于检测到错误,从所选择的寄存器读取值以形成读取值。 所选寄存器复位。 与错误相关联的读取值中的每个位都将被清除以形成一个清除的值。 清除的值被写入所选择的寄存器,从而保留从寄存器清除以来发生的错误。 响应于在输入/输出子系统中的桥中没有检测到错误,扫描桥下的错误寄存器。 作出响应于是否存在错误是否已经发生错误的确定; 通过扫描桥下的寄存器来找到。 响应于先前没有发生错误的确定而报告错误。

    Virtualized NVRAM access methods to provide NVRAM chrp regions for logical partitions through hypervisor system calls
    9.
    发明申请
    Virtualized NVRAM access methods to provide NVRAM chrp regions for logical partitions through hypervisor system calls 有权
    虚拟化NVRAM访问方法,通过管理程序系统调用为逻辑分区提供NVRAM chrp区域

    公开(公告)号:US20020129212A1

    公开(公告)日:2002-09-12

    申请号:US09798292

    申请日:2001-03-01

    IPC分类号: G06F012/14

    CPC分类号: G06F12/1441 G06F12/1466

    摘要: A method, system, and computer program product for enforcing logical partitioning of a shared device to which multiple partitions within a data processing system have access is provided. In one embodiment, a firmware portion of the data processing system receives a request from a requesting device, such as a processor assigned to one of a plurality of partitions within the data processing system, to access (i.e., read from or write to) a portion of the shared device, such as an NVRAM. The request includes a virtual address corresponding to the portion of the shared device for which access is desired. If the virtual address is within a range of addresses for which the requesting device is authorized to access, the firmware provides access to the requested portion of the shared device to the requesting device. If the virtual address is not within a range of addresses for which the requesting device is authorized to access, the firmware denies the request.

    摘要翻译: 提供了一种用于执行数据处理系统内的多个分区具有访问权限的共享设备的逻辑分区的方法,系统和计算机程序产品。 在一个实施例中,数据处理系统的固件部分从请求设备接收请求,诸如分配给数据处理系统内的多个分区中的一个的处理器来访问(即,读取或写入) 共享设备的一部分,例如NVRAM。 请求包括对应于期望访问的共享设备的部分的虚拟地址。 如果虚拟地址在请求设备被授权访问的地址范围之内,则该固件向共享设备的请求部分提供对请求设备的访问。 如果虚拟地址不在请求设备被授权访问的地址范围内,则该固件将拒绝该请求。

    Method and apparatus to power off and/or reboot logical partitions in a data processing system
    10.
    发明申请
    Method and apparatus to power off and/or reboot logical partitions in a data processing system 有权
    在数据处理系统中关闭和/或重启逻辑分区的方法和装置

    公开(公告)号:US20020124194A1

    公开(公告)日:2002-09-05

    申请号:US09798167

    申请日:2001-03-01

    摘要: A method, apparatus, and computer implemented instructions for controlling power in a data processing system having a plurality of logical partitions. Responsive to receiving a request to turn off the power for a logical partition within the plurality of logical partitions in the data processing system, a determination is made as to whether an additional partition within the plurality of logical partitions is present in the data processing system. The power is turned off in the data processing system in response to a determination an additional partition within the plurality of logical partitions is absent in the data processing system. The logical partition is shut down in response to a determination that an additional partition within the plurality of logical partitions is present in the data processing system. The mechanism of the present invention also provides for rebooting logical partitions. A request is received to reboot a logical partition within the plurality of logical partitions. A reset signal is activated only for each processor assigned to the logical partition.

    摘要翻译: 一种用于在具有多个逻辑分区的数据处理系统中控制功率的方法,装置和计算机实现的指令。 响应于接收关于数据处理系统中的多个逻辑分区内的逻辑分区的电源的请求的请求,确定数据处理系统中是否存在多个逻辑分区内的附加分区。 响应于确定在数据处理系统中不存在多个逻辑分区内的附加分区,在数据处理系统中关闭电源。 响应于确定多个逻辑分区中的附加分区存在于数据处理系统中,逻辑分区被关闭。 本发明的机制还提供重新启动逻辑分区。 接收到重新启动多个逻辑分区内的逻辑分区的请求。 复位信号仅对分配给逻辑分区的每个处理器激活。