Synchronize error handling for a plurality of partitions
    1.
    发明授权
    Synchronize error handling for a plurality of partitions 有权
    同步多个分区的错误处理

    公开(公告)号:US08151147B2

    公开(公告)日:2012-04-03

    申请号:US12640971

    申请日:2009-12-17

    IPC分类号: G06F11/00

    CPC分类号: G06F11/0793 G06F11/0709

    摘要: In accordance with at least some embodiments, a system comprises a plurality of partitions, each partition having its own error handler. The system further comprises a plurality of resources assignable to the plurality of partitions. The system further comprises management logic coupled to the plurality of partitions and the plurality of resources. The management logic comprises an error management tool that synchronizes operation of the error handlers in response to an error.

    摘要翻译: 根据至少一些实施例,系统包括多个分区,每个分区具有其自己的错误处理程序。 该系统还包括可分配给多个分区的多个资源。 该系统还包括耦合到多个分区和多个资源的管理逻辑。 管理逻辑包括错误管理工具,该错误管理工具使错误处理程序的响应于错误的操作同步。

    Field replaceable unit failure determination
    2.
    发明授权
    Field replaceable unit failure determination 有权
    现场可更换单元故障确定

    公开(公告)号:US08108724B2

    公开(公告)日:2012-01-31

    申请号:US12641072

    申请日:2009-12-17

    IPC分类号: G06F11/00

    摘要: A system and method for fault management in a computer-based system are disclosed herein. A system includes a plurality of field replaceable units (“FRUs”) and fault management logic. The fault management logic is configured to collect error information from a plurality of components of the system. The logic stores, for each component identified as a possible cause of a detected fault, a record assigning one of two different component failure probability indications. The logic identifies a single of the plurality of FRUs that has failed based on the stored probability indications.

    摘要翻译: 本文公开了一种用于基于计算机的系统中的故障管理的系统和方法。 系统包括多个现场可更换单元(“FRU”)和故障管理逻辑。 故障管理逻辑被配置为从系统的多个部件收集错误信息。 对于被识别为检测到的故障的可能原因的每个组件,逻辑存储器分配两个不同组件故障概率指示之一的记录。 该逻辑基于所存储的概率指示来识别已经发生故障的多个FRU中的单个。

    SYNCHRONIZE ERROR HANDLING FOR A PLURALITY OF PARTITIONS
    3.
    发明申请
    SYNCHRONIZE ERROR HANDLING FOR A PLURALITY OF PARTITIONS 有权
    同步处理多个分区的错误处理

    公开(公告)号:US20110154128A1

    公开(公告)日:2011-06-23

    申请号:US12640971

    申请日:2009-12-17

    IPC分类号: G06F11/07

    CPC分类号: G06F11/0793 G06F11/0709

    摘要: In accordance with at least some embodiments, a system comprises a plurality of partitions, each partition having its own error handler. The system further comprises a plurality of resources assignable to the plurality of partitions. The system further comprises management logic coupled to the plurality of partitions and the plurality of resources. The management logic comprises an error management tool that synchronizes operation of the error handlers in response to an error.

    摘要翻译: 根据至少一些实施例,系统包括多个分区,每个分区具有其自己的错误处理程序。 该系统还包括可分配给多个分区的多个资源。 该系统还包括耦合到多个分区和多个资源的管理逻辑。 管理逻辑包括错误管理工具,该错误管理工具使错误处理程序的响应于错误的操作同步。

    Error log consolidation
    4.
    发明授权
    Error log consolidation 有权
    错误日志合并

    公开(公告)号:US08122290B2

    公开(公告)日:2012-02-21

    申请号:US12641103

    申请日:2009-12-17

    IPC分类号: G06F11/00

    摘要: A system for error log consolidation is disclosed herein. A server computer includes a plurality of system processors and error log consolidation logic. The system processors are configurable to form isolated execution partitions. The error log consolidation logic is configured to, based on detection of a fault in the server, retrieve error logs from the system processors, and to consolidate the retrieved logs with server computer information not available to the system processors to generate a consolidated error log. The consolidated error log includes a comprehensive set of server information relevant to identifying a cause of the detected fault.

    摘要翻译: 本文公开了用于错误日志整合的系统。 服务器计算机包括多个系统处理器和错误日志合并逻辑。 系统处理器可配置为形成隔离的执行分区。 错误日志整合逻辑被配置为基于检测到服务器中的故障,从系统处理器中检索错误日志,并将检索到的日志与系统处理器不可用的服务器计算机信息合并,以生成统一的错误日志。 统一的错误日志包括与识别检测到的故障原因相关的全套服务器信息。

    FIELD REPLACEABLE UNIT FAILURE DETERMINATION
    5.
    发明申请
    FIELD REPLACEABLE UNIT FAILURE DETERMINATION 有权
    现场可更换单元故障确定

    公开(公告)号:US20110154097A1

    公开(公告)日:2011-06-23

    申请号:US12641072

    申请日:2009-12-17

    IPC分类号: G06F11/20 G06F11/07

    摘要: A system and method for fault management in a computer-based system are disclosed herein. A system includes a plurality of field replaceable units (“FRUs”) and fault management logic. The fault management logic is configured to collect error information from a plurality of components of the system. The logic stores, for each component identified as a possible cause of a detected fault, a record assigning one of two different component failure probability indications. The logic identifies a single of the plurality of FRUs that has failed based on the stored probability indications.

    摘要翻译: 本文公开了一种用于基于计算机的系统中的故障管理的系统和方法。 系统包括多个现场可更换单元(“FRU”)和故障管理逻辑。 故障管理逻辑被配置为从系统的多个部件收集错误信息。 对于被识别为检测到的故障的可能原因的每个组件,逻辑存储器分配两个不同组件故障概率指示之一的记录。 该逻辑基于所存储的概率指示来识别已经发生故障的多个FRU中的单个。

    System and method for using information relating to a detected loss of lockstep for determining a responsive action
    6.
    发明授权
    System and method for using information relating to a detected loss of lockstep for determining a responsive action 失效
    用于使用与检测到的锁步丢失相关的信息来确定响应动作的系统和方法

    公开(公告)号:US07516359B2

    公开(公告)日:2009-04-07

    申请号:US10972835

    申请日:2004-10-25

    IPC分类号: G06F11/00

    摘要: According to one embodiment, a method comprises detecting a loss of lockstep (LOL) for a processor module. The method further comprises determining a type of LOL that is detected, and, based at least in part on the determined type of LOL, determining a responsive action to take for the LOL. According to one embodiment, a method comprises detecting a loss of lockstep (LOL) for a processor module. The method further comprises using information identifying at least one of type of the detected LOL and source of the detected LOL to determine a responsive action to take for the LOL.

    摘要翻译: 根据一个实施例,一种方法包括检测处理器模块的锁步丢失(LOL)。 该方法还包括确定所检测的LOL的类型,并且至少部分地基于所确定的LOL类型,确定为LOL采取的响应动作。 根据一个实施例,一种方法包括检测处理器模块的锁步丢失(LOL)。 该方法还包括使用识别检测到的LOL的类型和检测到的LOL的来源中的至少一种的信息来确定对于LOL采取的响应动作。

    Containing machine check events in a virtual partition
    7.
    发明授权
    Containing machine check events in a virtual partition 有权
    在虚拟分区中包含机器检查事件

    公开(公告)号:US07657776B2

    公开(公告)日:2010-02-02

    申请号:US11523892

    申请日:2006-09-20

    IPC分类号: G06F11/00

    摘要: Embodiments include methods, apparatus, and systems for containing machine check events in a virtual partition. One embodiment is a method of software execution. The method divides a hard partition into first and second virtual partitions and attempts to correct an error in a firmware layer of the first virtual partition. If the error is not correctable, then the method reboots the first virtual partition without disrupting hardware resources in the second virtual partition.

    摘要翻译: 实施例包括用于在虚拟分区中包含机器检查事件的方法,装置和系统。 一个实施例是软件执行的方法。 该方法将硬分区划分为第一和第二虚拟分区,并尝试纠正第一虚拟分区的固件层中的错误。 如果错误不可修改,则该方法将重新启动第一个虚拟分区,而不会中断第二个虚拟分区中的硬件资源。

    System and method for switching the role of boot processor to a spare processor responsive to detection of loss of lockstep in a boot processor
    8.
    发明授权
    System and method for switching the role of boot processor to a spare processor responsive to detection of loss of lockstep in a boot processor 有权
    响应于检测引导处理器中的锁步丢失而将引导处理器的角色切换到备用处理器的系统和方法

    公开(公告)号:US07624302B2

    公开(公告)日:2009-11-24

    申请号:US10972588

    申请日:2004-10-25

    IPC分类号: G06F11/00

    摘要: According to one embodiment, a method comprises detecting loss of lockstep (LOL) for a processor in a multi-processor system. The method further comprises determining that the processor for which the LOL is detected is assigned the role of boot processor, and switching the role of boot processor to a spare processor without shutting down the system's operating system. In another embodiment, a method comprises system firmware determining that an LOL is detected for a lockstep pair of processors that are assigned the role of boot processor in a system. The method further comprises determining one of the lockstep pair of processors that is not the cause of the LOL, and copying the state of the determined one of the lockstep pair of processors that is not the cause of the LOL to a spare processor. The method further comprises switching the role of boot processor to the spare processor.

    摘要翻译: 根据一个实施例,一种方法包括检测处理器在多处理器系统中的锁步(LOL)的损失。 该方法还包括确定检测到LOL的处理器被分配引导处理器的角色,并且将引导处理器的角色切换到备用处理器而不关闭系统的操作系统。 在另一个实施例中,一种方法包括系统固件,确定为系统中分配了引导处理器角色的锁步对处理器检测到LOL。 该方法还包括确定不是LOL的原因的处理器的锁步对之一,以及将不是LOL原因的所确定的一对锁定步骤的处理器的一个状态复制到备用处理器。 该方法还包括将启动处理器的角色切换到备用处理器。

    System and method for configuring lockstep mode of a processor module
    9.
    发明授权
    System and method for configuring lockstep mode of a processor module 有权
    用于配置处理器模块的锁步模式的系统和方法

    公开(公告)号:US07308566B2

    公开(公告)日:2007-12-11

    申请号:US10973004

    申请日:2004-10-25

    IPC分类号: G06F9/00 G06F15/177 G06F1/24

    摘要: A system comprises a processor module that supports lockstep mode of operation. The system further comprises non-volatile data storage having stored thereto configuration information specifying whether the processor module is desired to operate in lockstep mode. A method comprises storing configuration information to non-volatile data storage of a system, wherein the configuration information specifies whether lockstep mode of operation is desired to be enabled or disabled for a processor module of the system. The method further comprises causing, by the system, the processor module to have its lockstep mode enabled or disabled as specified by the configuration information.

    摘要翻译: 系统包括支持锁步操作模式的处理器模块。 该系统还包括具有存储的配置信息的非易失性数据存储器,其指定处理器模块是否希望在锁步模式下操作。 一种方法包括将配置信息存储到系统的非易失性数据存储器中,其中所述配置信息指定是否需要为系统的处理器模块启用或禁用锁步操作模式。 该方法还包括由系统使得处理器模块使得其配置信息指定的启用或禁用其锁步模式。

    System and method for system firmware causing an operating system to idle a processor
    10.
    发明申请
    System and method for system firmware causing an operating system to idle a processor 有权
    使系统固件的系统和方法导致操作系统空闲处理器

    公开(公告)号:US20060107115A1

    公开(公告)日:2006-05-18

    申请号:US10972888

    申请日:2004-10-25

    IPC分类号: G06F11/00

    摘要: According to one embodiment, a method comprises system firmware instructing a system's operating system to idle a processor, and responsive to the instructing, the operating system idling the processor and returning control over the processor to the system firmware. According to one embodiment, a method comprises detecting loss of lockstep (LOL) for a processor module in a system, and responsive to the detecting LOL for the processor module, system firmware instructing an operating system to idle the processor module.

    摘要翻译: 根据一个实施例,一种方法包括指示系统的操作系统空闲处理器的系统固件,并且响应于指令操作系统空转处理器并将处理器上的控制返回到系统固件。 根据一个实施例,一种方法包括检测系统中的处理器模块的锁步(LOL)的损失,并响应于对处理器模块的检测LOL的系统固件指示操作系统空闲处理器模块。