Method and system of error logging
    1.
    发明授权
    Method and system of error logging 有权
    错误记录的方法和系统

    公开(公告)号:US08122291B2

    公开(公告)日:2012-02-21

    申请号:US12691512

    申请日:2010-01-21

    IPC分类号: G06F11/00

    摘要: Method and system of error logging. At least some of the illustrative embodiments are methods including detecting assertion of an error pin by a processor system, (comprising at least a main processor and a chipset, the assertion of the error pin an indication to reboot the processor system) the detecting by a reset circuit, notifying a management processor (distinct from the main processor) that the error pin is asserted (the notifying by the reset circuit), writing to a plurality of registers in the chipset (the writing by the management processor), de-asserting a reset pin of the main processor, and then executing by the main processor an error-handling code to generate an error log.

    摘要翻译: 错误记录的方法和系统 说明性实施例中的至少一些是包括检测处理器系统的错误引脚的断言(包括至少主处理器和芯片组,断言错误引脚的重新引导处理器系统的指示)的方法, 复位电路,通知管理处理器(与主处理器不同),错误引脚被断言(由复位电路通知),写入芯片组中的多个寄存器(管理处理器的写入),取消断言 主处理器的复位引脚,然后由主处理器执行错误处理代码以生成错误日志。

    METHOD AND SYSTEM OF ERROR LOGGING
    2.
    发明申请
    METHOD AND SYSTEM OF ERROR LOGGING 有权
    错误记录方法与系统

    公开(公告)号:US20110179314A1

    公开(公告)日:2011-07-21

    申请号:US12691512

    申请日:2010-01-21

    IPC分类号: G06F11/07

    摘要: Method and system of error logging. At least some of the illustrative embodiments are methods including detecting assertion of an error pin by a processor system, (comprising at least a main processor and a chipset, the assertion of the error pin an indication to reboot the processor system) the detecting by a reset circuit, notifying a management processor (distinct from the main processor) that the error pin is asserted (the notifying by the reset circuit), writing to a plurality of registers in the chipset (the writing by the management processor), de-asserting a reset pin of the main processor, and then executing by the main processor an error-handling code to generate an error log.

    摘要翻译: 错误记录的方法和系统 说明性实施例中的至少一些是包括检测处理器系统的错误引脚的断言(包括至少主处理器和芯片组,断言错误引脚的重新引导处理器系统的指示)的方法, 复位电路,通知管理处理器(与主处理器不同),错误引脚被断言(由复位电路通知),写入芯片组中的多个寄存器(管理处理器的写入),取消断言 主处理器的复位引脚,然后由主处理器执行错误处理代码以生成错误日志。

    Handling errors in a data processing system
    3.
    发明授权
    Handling errors in a data processing system 有权
    处理数据处理系统中的错误

    公开(公告)号:US08713350B2

    公开(公告)日:2014-04-29

    申请号:US12633648

    申请日:2009-12-08

    IPC分类号: G06F11/00

    摘要: A method of managing errors in a data processing system may involve at least one computer system. Each computer system may include a processor that executes an operating system, firmware, and system memory storing instructions for the operating system. A firmware error handler resident in the firmware may identify an error occurring in the computer system. The firmware error handler may determine whether the operating system is required to take an action in response to the error. If the operating system is not required to take an action in response to the error, the firmware error handler may create an error log accessible to the operating system appropriate to cause the operating system to take no action.

    摘要翻译: 管理数据处理系统中的错误的方法可以涉及至少一个计算机系统。 每个计算机系统可以包括执行存储操作系统的指令的操作系统,固件和系统存储器的处理器。 驻留在固件中的固件错误处理程序可能会识别计算机系统中发生的错误。 固件错误处理程序可以确定操作系统是否需要采取响应错误的动作。 如果操作系统不需要采取措施来响应错误,则固件错误处理程序可能会创建适用于使操作系统不采取任何操作的操作系统可访问的错误日志。

    Method and apparatus for processing unit synchronization for scalable parallel processing

    公开(公告)号:US07103639B2

    公开(公告)日:2006-09-05

    申请号:US09730221

    申请日:2000-12-05

    IPC分类号: G06F15/167

    摘要: The present invention flexibly manages the formation of a partition from a plurality of independently executing cells (discrete hardware entities comprising system resources) in preparation for the instantiation of an operating system instance upon the partition. Specifically, the invention manages configuration activities that occur to transition from having individual cells acting independently, and having cells rendezvous, to having cells become interdependent to continue operations as a partition. The invention manages the partitioning forming process such that no single point of failure disrupts the process. Instead, the invention is implemented as a distributed application wherein individual cells independently execute instructions based upon respective copies of the complex profile (a “map” of the complex configuration). Also, the invention adapts to a degree of delay associated with certain cells becoming ready to join the formation or rendezvous process. The invention is able to cope with missing, unavailable, or otherwise malfunctioning cells. Additionally, the invention analyzes present cells to determine their compatibility and reject cells that are not compatible.

    Analysis result stored on a field replaceable unit
    6.
    发明授权
    Analysis result stored on a field replaceable unit 有权
    存储在现场可更换单元上的分析结果

    公开(公告)号:US08161324B2

    公开(公告)日:2012-04-17

    申请号:US12641091

    申请日:2009-12-17

    IPC分类号: G06F11/00

    CPC分类号: G06F11/0751 G06F11/0727

    摘要: A system and method for recording fault information in an electronic system are disclosed herein. A system includes fault analysis logic and a plurality of field replaceable units (“FRUs”). The fault analysis is configured to analyze system error information, and identify at least one of the FRUs in the system to be a possible cause of a detected fault based on the analysis. Each FRU includes writeable non-volatile storage including storage locations reserved to store information including a result of the analysis. The result of the analysis indicates a reason that the FRU storing the information was determined, by the fault analysis logic, to be a possible cause of the fault.

    摘要翻译: 本文公开了一种在电子系统中记录故障信息的系统和方法。 系统包括故障分析逻辑和多个现场可更换单元(“FRU”)。 故障分析被配置为分析系统错误信息,并且基于分析将系统中的至少一个FRU识别为检测到的故障的可能原因。 每个FRU包括可写入的非易失性存储器,包括保存用于存储包括分析结果的信息的存储位置。 分析结果表明存在信息的FRU由故障分析逻辑确定为故障的可能原因。

    Error log consolidation
    7.
    发明授权
    Error log consolidation 有权
    错误日志合并

    公开(公告)号:US08122290B2

    公开(公告)日:2012-02-21

    申请号:US12641103

    申请日:2009-12-17

    IPC分类号: G06F11/00

    摘要: A system for error log consolidation is disclosed herein. A server computer includes a plurality of system processors and error log consolidation logic. The system processors are configurable to form isolated execution partitions. The error log consolidation logic is configured to, based on detection of a fault in the server, retrieve error logs from the system processors, and to consolidate the retrieved logs with server computer information not available to the system processors to generate a consolidated error log. The consolidated error log includes a comprehensive set of server information relevant to identifying a cause of the detected fault.

    摘要翻译: 本文公开了用于错误日志整合的系统。 服务器计算机包括多个系统处理器和错误日志合并逻辑。 系统处理器可配置为形成隔离的执行分区。 错误日志整合逻辑被配置为基于检测到服务器中的故障,从系统处理器中检索错误日志,并将检索到的日志与系统处理器不可用的服务器计算机信息合并,以生成统一的错误日志。 统一的错误日志包括与识别检测到的故障原因相关的全套服务器信息。

    FIELD REPLACEABLE UNIT FAILURE DETERMINATION
    8.
    发明申请
    FIELD REPLACEABLE UNIT FAILURE DETERMINATION 有权
    现场可更换单元故障确定

    公开(公告)号:US20110154097A1

    公开(公告)日:2011-06-23

    申请号:US12641072

    申请日:2009-12-17

    IPC分类号: G06F11/20 G06F11/07

    摘要: A system and method for fault management in a computer-based system are disclosed herein. A system includes a plurality of field replaceable units (“FRUs”) and fault management logic. The fault management logic is configured to collect error information from a plurality of components of the system. The logic stores, for each component identified as a possible cause of a detected fault, a record assigning one of two different component failure probability indications. The logic identifies a single of the plurality of FRUs that has failed based on the stored probability indications.

    摘要翻译: 本文公开了一种用于基于计算机的系统中的故障管理的系统和方法。 系统包括多个现场可更换单元(“FRU”)和故障管理逻辑。 故障管理逻辑被配置为从系统的多个部件收集错误信息。 对于被识别为检测到的故障的可能原因的每个组件,逻辑存储器分配两个不同组件故障概率指示之一的记录。 该逻辑基于所存储的概率指示来识别已经发生故障的多个FRU中的单个。

    Managing errors in a data processing system
    9.
    发明授权
    Managing errors in a data processing system 有权
    管理数据处理系统中的错误

    公开(公告)号:US08839032B2

    公开(公告)日:2014-09-16

    申请号:US13258392

    申请日:2009-12-08

    IPC分类号: G06F11/07

    摘要: A method of managing errors in a data processing system (10) may involve at least one computer system (14). Each computer system (14) may include a plurality of hardware components (18), including a processor (20) for executing a respective operating system and a memory (22) for storing instructions for the respective operating system (24), and firmware (28) including a firmware error handler (30). For each computer system (14), the firmware error handler (30) may identify an error occurring in one of the hardware components (18). Each respective firmware error handler (30) may communicate error information about the identified error to an error manager (32) external of the computer system (14). The error manager (14) may compile the error information communicated from each respective firmware error handler (30).

    摘要翻译: 管理数据处理系统(10)中的错误的方法可以包括至少一个计算机系统(14)。 每个计算机系统(14)可以包括多个硬件组件(18),包括用于执行相应操作系统的处理器(20)和用于存储相应操作系统(24)的指令的存储器(22)和固件 28),其包括固件错误处理程序(30)。 对于每个计算机系统(14),固件错误处理器(30)可以识别在硬件组件(18)之一上发生的错误。 每个相应的固件错误处理器(30)可以将关于所识别的错误的错误信息传送到计算机系统(14)的外部的错误管理器(32)。 错误管理器(14)可以编译从每个相应的固件错误处理器(30)传送的错误信息。

    RESOURCE FAULT MANAGEMENT FOR PARTITIONS
    10.
    发明申请
    RESOURCE FAULT MANAGEMENT FOR PARTITIONS 有权
    资源故障管理

    公开(公告)号:US20110154349A1

    公开(公告)日:2011-06-23

    申请号:US12641001

    申请日:2009-12-17

    IPC分类号: G06F9/50

    CPC分类号: G06F9/5061 G06F9/22 G06F9/44

    摘要: In accordance with at least some embodiments, a system includes a plurality of partitions, each partition having its own operating system (OS) and workload. The system also includes a plurality of resources assignable to the plurality of partitions. The system also includes management logic coupled to the plurality of partitions and the plurality of resources. The management logic is configured to set priority rules for each of the plurality of partitions based on user input. The management logic performs automated resource fault management for the resources assigned to the plurality of partitions based on the priority rules.

    摘要翻译: 根据至少一些实施例,系统包括多个分区,每个分区具有其自己的操作系统(OS)和工作负载。 该系统还包括可分配给多个分区的多个资源。 该系统还包括耦合到多个分区和多个资源的管理逻辑。 管理逻辑被配置为基于用户输入来为多个分区中的每一个设置优先权规则。 管理逻辑基于优先级规则对分配给多个分区的资源执行自动资源故障管理。