Failure data management for a distributed computer system
    1.
    发明授权
    Failure data management for a distributed computer system 有权
    分布式计算机系统的故障数据管理

    公开(公告)号:US08812916B2

    公开(公告)日:2014-08-19

    申请号:US13151985

    申请日:2011-06-02

    IPC分类号: G06F11/00

    摘要: Various systems, processes, products, and techniques may be used to manage failure data for a distributed computer system. In particular implementations, a system and process for managing distributed data for a distributed computer system may include the ability to determine at a service processor of a first node in a distributed computer system that comprises a plurality of nodes whether a failure has occurred in the first node and identify a service processor of a second node in the distributed computer system in which to store failure data if a failure has occurred. The system and process may also include the ability to store at least part of the failure data in the identified service processor and determine whether there is more failure data to store than the identified service processor can store.

    摘要翻译: 可以使用各种系统,过程,产品和技术来管理分布式计算机系统的故障数据。 在特定实现中,用于管理分布式计算机系统的分布式数据的系统和过程可以包括在服务处理器处确定分布式计算机系统中的第一节点的能力,该分布式计算机系统包括多个节点,无论第一节点是否发生故障 节点并且识别分布式计算机系统中的第二节点的服务处理器,其中在发生故障时存储故障数据。 系统和过程还可以包括将所述故障数据的至少部分存储在所识别的服务处理器中并且确定是否存在比所标识的服务处理器可以存储的更多的存储故障数据的能力。

    Logical Partition Defragmentation with a Data Processing System
    2.
    发明申请
    Logical Partition Defragmentation with a Data Processing System 有权
    数据处理系统的逻辑分区碎片整理

    公开(公告)号:US20120284484A1

    公开(公告)日:2012-11-08

    申请号:US13459812

    申请日:2012-04-30

    IPC分类号: G06F12/02

    摘要: A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.

    摘要翻译: 在数据处理系统中提供了用于逻辑分区碎片整理的机制。 该机制收集在一个或多个服务器内的多个电力域中运行的多个逻辑分区的资源需求。 该机制确定了多个逻辑分区的最佳硬件利用率。 该机制迁移多个逻辑分区中的一个或多个以在多个功率域的子集中运行,使得多个功率域内的至少一个功率域未被使用。 该机制使至少一个未使用的功率域处于低功率状态。

    INTERRUPT-DRIVEN LINK STATUS FEEDBACK MECHANISM FOR EMBEDDED SWITCHES
    3.
    发明申请
    INTERRUPT-DRIVEN LINK STATUS FEEDBACK MECHANISM FOR EMBEDDED SWITCHES 有权
    用于嵌入式开关的中断驱动链路状态反馈机制

    公开(公告)号:US20100250810A1

    公开(公告)日:2010-09-30

    申请号:US12412502

    申请日:2009-03-27

    IPC分类号: G06F13/24

    CPC分类号: H04L69/40

    摘要: A computer implemented method, a tangible computer readable medium, and a data processing system intelligently propagate link status information received by a blade server to the various ports of an embedded multi-port switch. The link status of a switch port in an external switch module can be communicated to the operating systems of individual blade servers that are affected by that link status. When an external switch module is unplugged from a server blade chassis, the bus controller broadcasts a link down event, such as a link down interrupt, to the individual server blades where it is received by the embedded multi-port switch for those server blades. The embedded multi-port switch translates the link down interrupt into a hardware link down event, and forwards the hardware link down event to the other elements connected to the embedded multi-port switch.

    摘要翻译: 计算机实现的方法,有形计算机可读介质和数据处理系统将由刀片服务器接收的链路状态信息智能地传播到嵌入式多端口交换机的各个端口。 外部交换机模块中的交换机端口的链路状态可以传送到受该链路状态影响的各个刀片服务器的操作系统。 当外部交换机模块从服务器刀片服务器机箱拔下时,总线控制器将链接断开事件(例如链接中断)广播到单个服务器刀片,由那些服务器刀片的嵌入式多端口交换机接收。 嵌入式多端口交换机将链路中断转换为硬件链路断开事件,并将硬件链路断开事件转发到连接到嵌入式多端口交换机的其他元件。

    METHOD AND SYSTEM FOR REMOTE NODE DEBUGGING USING AN EMBEDDED NODE CONTROLLER
    4.
    发明申请
    METHOD AND SYSTEM FOR REMOTE NODE DEBUGGING USING AN EMBEDDED NODE CONTROLLER 失效
    使用嵌入式节点控制器进行远程节点调试的方法和系统

    公开(公告)号:US20100180021A1

    公开(公告)日:2010-07-15

    申请号:US12353287

    申请日:2009-01-14

    IPC分类号: G06F15/173

    摘要: A method, system and computer program product for remotely debugging a malfunctioning node controller of a node in a distributed node network through a functioning node controller of the same node. The method comprises establishing a serial link between the malfunctioning node controller and a functioning node controller and configuring the functioning node controller as a virtual console by the remotely-located central data processing system (DPS). The method further includes receiving, via an internal Fru Support Interface (FSI) link, serial data from the malfunctioning node controller through the virtual console, and debugging, by the DPS, a failure condition of the malfunctioning node controller, in response to receipt of the serial data through the virtual console.

    摘要翻译: 一种用于通过同一节点的功能节点控制器远程调试分布式节点网络中节点故障节点控制器的方法,系统和计算机程序产品。 该方法包括在故障节点控制器和功能节点控制器之间建立串行链路,并通过位于远程的中央数据处理系统(DPS)将功能节点控制器配置为虚拟控制台。 该方法还包括通过内部Fru支持接口(FSI)链路从故障节点控制器通过虚拟控制台接收串行数据,并且由DPS调试故障节点控制器的故障状况,以响应于接收到 串行数据通过虚拟控制台。

    Logical partition defragmentation within a data processing system
    6.
    发明授权
    Logical partition defragmentation within a data processing system 有权
    数据处理系统内的逻辑分区碎片整理

    公开(公告)号:US08819691B2

    公开(公告)日:2014-08-26

    申请号:US13100358

    申请日:2011-05-04

    IPC分类号: G06F9/46 G06F9/455

    摘要: A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.

    摘要翻译: 在数据处理系统中提供了用于逻辑分区碎片整理的机制。 该机制收集在一个或多个服务器内的多个电力域中运行的多个逻辑分区的资源需求。 该机制确定了多个逻辑分区的最佳硬件利用率。 该机制迁移多个逻辑分区中的一个或多个以在多个功率域的子集中运行,使得多个功率域内的至少一个功率域未被使用。 该机制使至少一个未使用的功率域处于低功率状态。

    Logical Partition Defragmentation Within a Data Processing System
    7.
    发明申请
    Logical Partition Defragmentation Within a Data Processing System 有权
    数据处理系统中的逻辑分区碎片整理

    公开(公告)号:US20120284549A1

    公开(公告)日:2012-11-08

    申请号:US13100358

    申请日:2011-05-04

    IPC分类号: G06F1/32

    摘要: A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.

    摘要翻译: 在数据处理系统中提供了用于逻辑分区碎片整理的机制。 该机制收集在一个或多个服务器内的多个电力域中运行的多个逻辑分区的资源需求。 该机制确定了多个逻辑分区的最佳硬件利用率。 该机制迁移多个逻辑分区中的一个或多个以在多个功率域的子集中运行,使得多个功率域内的至少一个功率域未被使用。 该机制使至少一个未使用的功率域处于低功率状态。

    Wake on Lan for blade server
    8.
    发明授权
    Wake on Lan for blade server 失效
    Wake on Lan刀片服务器

    公开(公告)号:US08140871B2

    公开(公告)日:2012-03-20

    申请号:US12412402

    申请日:2009-03-27

    IPC分类号: G06F1/26

    摘要: A computer implemented method, a tangible computer medium, and a data processing system are provided for waking a blade server from an operational state of reduced power. When server blade enters the state of reduced power, a service firmware configures a multi-port blade switch of the server blade to direct incoming packets to the service firmware. The service firmware then polls for receipt of a Wake-on-Lan magic packet. When the Wake-on-Lan magic packet is received by the service firmware, the service firmware reconfigures the multi-port blade switch to direct incoming packets to a network interface card of the server blade. The service firmware then initiates a reboot of the server blade.

    摘要翻译: 提供计算机实现的方法,有形计算机介质和数据处理系统,用于从降低功率的操作状态唤醒刀片服务器。 当服务器刀片进入降低功耗状态时,服务固件会将服务器刀片的多端口刀片式交换机配置为将传入的数据包引导到服务固件。 服务固件然后轮询接收到蓝牙唤醒魔术包。 当服务固件接收到Wake-on-Lan魔术数据包时,服务固件将重新配置多端口刀片式交换机,将传入的数据包引导到服务器刀片的网络接口卡。 然后,服务固件启动服务器刀片服务器的重新启动。

    NODE CONTROLLER FIRST FAILURE ERROR MANAGEMENT FOR A DISTRIBUTED SYSTEM
    9.
    发明申请
    NODE CONTROLLER FIRST FAILURE ERROR MANAGEMENT FOR A DISTRIBUTED SYSTEM 失效
    节点控制器分布式系统的第一个故障错误管理

    公开(公告)号:US20110276822A1

    公开(公告)日:2011-11-10

    申请号:US12775195

    申请日:2010-05-06

    IPC分类号: G06F11/20 G06F11/00

    摘要: A distributed system provides error handling wherein the system includes multiple nodes, each node being coupled to multiple node controllers for control redundancy. Multiple system controllers couple to the node controllers via a network bus. A particular node controller may detect an error of that particular node controller. The particular node controller may store error information relating to the detected error in respective nonvolatile memory stores in the system controllers and node controllers according to a particular priority order. In accordance with the particular priority order, for example, the particular node controller may first attempt to store the error information to a primary system controller memory store, then to a secondary system controller memory store, and then to sibling and non-sibling node controller memory stores. The primary system controller organizes available error information for use by system administrators and other resources of the distributed system.

    摘要翻译: 分布式系统提供错误处理,其中系统包括多个节点,每个节点耦合到多个节点控制器以用于控制冗余。 多个系统控制器通过网络总线耦合到节点控制器。 特定节点控制器可以检测该特定节点控制器的错误。 特定节点控制器可以根据特定的优先级顺序在系统控制器和节点控制器中的各个非易失性存储器存储中存储与检测到的错误有关的错误信息。 根据特定优先级顺序,例如,特定节点控制器可以首先尝试将错误信息存储到主系统控制器存储器存储器中,然后到辅助系统控制器存储器存储器,然后到同步和非兄弟节点控制器 记忆店。 主系统控制器组织可用的错误信息供系统管理员和分布式系统的其他资源使用。

    SELECTION OF A REDUNDANT CONTROLLER BASED ON RESOURCE VIEW
    10.
    发明申请
    SELECTION OF A REDUNDANT CONTROLLER BASED ON RESOURCE VIEW 有权
    基于资源视图选择冗余控制器

    公开(公告)号:US20100153679A1

    公开(公告)日:2010-06-17

    申请号:US12335690

    申请日:2008-12-16

    IPC分类号: G06F12/00

    CPC分类号: G06F9/5011

    摘要: A method, a system and a computer program product for selecting a primary controller for a server system based on the services offered by each controller. A primary controller designator (PCD) utility determines the relative importance of a controller based upon the services provided by the controller and the weighted importance assigned to these services. The PCD utility classifies the services provided by a system-controller according to the following: (1) the number of OS partitions a system-controller is able to communicate with; and (2) the number of hardware devices that a controller has access to. The importance of the services is determined by the host OS partition information and the degree of importance of a partition that utilizes/requires the particular service(s). The PCD utility designates a controller as a “Primary” if the designated “Primary” is capable of providing services that are required for the most important OS partitions, according to the classification of controller services.

    摘要翻译: 一种用于基于由每个控制器提供的服务来选择用于服务器系统的主控制器的方法,系统和计算机程序产品。 主控制器指示符(PCD)实用程序根据控制器提供的服务和分配给这些服务的加权重要性来确定控制器的相对重要性。 PCD实用程序根据以下内容对系统控制器提供的服务进行分类:(1)系统控制器能够通信的OS分区数; 和(2)控制器访问的硬件设备的数量。 服务的重要性由主机OS分区信息和利用/要求特定服务的分区的重要程度决定。 如果指定的“主”能够提供最重要的OS分区所需的服务,则根据控制器服务的分类,PCD实用程序将控制器指定为“主”。