摘要:
Various systems, processes, products, and techniques may be used to manage failure data for a distributed computer system. In particular implementations, a system and process for managing distributed data for a distributed computer system may include the ability to determine at a service processor of a first node in a distributed computer system that comprises a plurality of nodes whether a failure has occurred in the first node and identify a service processor of a second node in the distributed computer system in which to store failure data if a failure has occurred. The system and process may also include the ability to store at least part of the failure data in the identified service processor and determine whether there is more failure data to store than the identified service processor can store.
摘要:
A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.
摘要:
A computer implemented method, a tangible computer readable medium, and a data processing system intelligently propagate link status information received by a blade server to the various ports of an embedded multi-port switch. The link status of a switch port in an external switch module can be communicated to the operating systems of individual blade servers that are affected by that link status. When an external switch module is unplugged from a server blade chassis, the bus controller broadcasts a link down event, such as a link down interrupt, to the individual server blades where it is received by the embedded multi-port switch for those server blades. The embedded multi-port switch translates the link down interrupt into a hardware link down event, and forwards the hardware link down event to the other elements connected to the embedded multi-port switch.
摘要:
A method, system and computer program product for remotely debugging a malfunctioning node controller of a node in a distributed node network through a functioning node controller of the same node. The method comprises establishing a serial link between the malfunctioning node controller and a functioning node controller and configuring the functioning node controller as a virtual console by the remotely-located central data processing system (DPS). The method further includes receiving, via an internal Fru Support Interface (FSI) link, serial data from the malfunctioning node controller through the virtual console, and debugging, by the DPS, a failure condition of the malfunctioning node controller, in response to receipt of the serial data through the virtual console.
摘要:
A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.
摘要:
A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.
摘要:
A mechanism, in a data processing system, is provided for logical partition defragmentation. The mechanism gathers resource requirements for a plurality of logical partitions running in a plurality of power domains within one or more servers. The mechanism determines optimal hardware utilization for the plurality of logical partitions. The mechanism migrates one or more of the plurality of logical partitions to run in a subset of the plurality of power domains such that at least one power domain within the plurality of power domains is unused. The mechanism puts the at least one unused power domain in a low power state.
摘要:
A computer implemented method, a tangible computer medium, and a data processing system are provided for waking a blade server from an operational state of reduced power. When server blade enters the state of reduced power, a service firmware configures a multi-port blade switch of the server blade to direct incoming packets to the service firmware. The service firmware then polls for receipt of a Wake-on-Lan magic packet. When the Wake-on-Lan magic packet is received by the service firmware, the service firmware reconfigures the multi-port blade switch to direct incoming packets to a network interface card of the server blade. The service firmware then initiates a reboot of the server blade.
摘要:
A distributed system provides error handling wherein the system includes multiple nodes, each node being coupled to multiple node controllers for control redundancy. Multiple system controllers couple to the node controllers via a network bus. A particular node controller may detect an error of that particular node controller. The particular node controller may store error information relating to the detected error in respective nonvolatile memory stores in the system controllers and node controllers according to a particular priority order. In accordance with the particular priority order, for example, the particular node controller may first attempt to store the error information to a primary system controller memory store, then to a secondary system controller memory store, and then to sibling and non-sibling node controller memory stores. The primary system controller organizes available error information for use by system administrators and other resources of the distributed system.
摘要:
A method, a system and a computer program product for selecting a primary controller for a server system based on the services offered by each controller. A primary controller designator (PCD) utility determines the relative importance of a controller based upon the services provided by the controller and the weighted importance assigned to these services. The PCD utility classifies the services provided by a system-controller according to the following: (1) the number of OS partitions a system-controller is able to communicate with; and (2) the number of hardware devices that a controller has access to. The importance of the services is determined by the host OS partition information and the degree of importance of a partition that utilizes/requires the particular service(s). The PCD utility designates a controller as a “Primary” if the designated “Primary” is capable of providing services that are required for the most important OS partitions, according to the classification of controller services.