摘要:
A method is described in which, in response to notice of a configuration event yet to happen within a network that is part of a link-based computing system, a component within said link based computing system: a) identifies networking configuration information changes to be made by components within the link-based computing system; and, b) sends instances of program code to each one of the components. Each instance of program code is to be executed by a specific component that it was sent to. Each instance of program code is customized to implement the particular one or more networking configuration information changes to be made at the specific component it was sent to.
摘要:
Embodiments include systems and methods for processing Reliability, Availability and Serviceability (RAS) events in a computer system. Embodiments comprise processing critical events in a first portion of a Management Interrupt (MI) period. The MI period is chosen to be not greater than a maximum tolerable Operating System (OS) latency period. If time remains in a current MI period after processing critical events, the system then processes non-critical events during the time remaining in the current MI period. If at the end of the current MI period, some non-critical events remain to be processed, a subsequent MI period is scheduled to process the remaining non-critical events.
摘要:
In some embodiments a boot progress of a System Boot Strap Processor in a multi-processor system is monitored and a boot processor failure is detected using an Application Processor. If the boot processor failure is detected at least a portion of the system is reinitialized (and/or the system is rebooted). Other embodiments are described and claimed.
摘要:
Embodiments include systems and methods for processing Reliability, Availability and Serviceability (RAS) events in a computer system. Embodiments comprise processing critical events in a first portion of a Management Interrupt (MI) period. The MI period is chosen to be not greater than a maximum tolerable Operating System (OS) latency period. If time remains in a current MI period after processing critical events, the system then processes non-critical events during the time remaining in the current MI period. If at the end of the current MI period, some non-critical events remain to be processed, a subsequent MI period is scheduled to process the remaining non-critical events.
摘要:
Various embodiments are described herein. Some embodiments include an Operating System and a platform. The platform includes a processor having an error register. The Operating System can write to the error register only via the platform in a secure manner (for example, using platform firmware). Other embodiments are described and claimed.
摘要:
A microcode (uCode) hot-upgrade method for bare metal cloud deployment and associated apparatus. Under the uCode hot-upgrade method, a uCode path is received at an out-of-band controller (e.g., baseboard management controller (BMC)) and buffered in a memory buffer in the out-of-band controller. The out-of-band controller exposes the memory buffer as a Memory-Mapped Input-Output (MMIO) range to a host CPU. A uCode upgrade interrupt service is triggered to upgrade uCode for one or more CPUs in a bare-metal cloud platform during runtime of a tenant host operating system (OS) using an out-of-bound process. This innovation enables cloud service providers to deploy uCode hot-patches to bare metal servers for live-patch without touching the tenant operating system environment.
摘要:
A microcode (uCode) hot-upgrade method for bare metal cloud deployment and associated apparatus. The uCode hot-upgrade method applies a uCode patch to a firmware storage device (e.g., BIOS SPI flash) through an out-of-band controller (e.g., baseboard management controller (BMC)). In conjunction with receiving a uCode patch, a uCode upgrade interrupt service is triggered to upgrade uCode for one or more CPUs in a bare-metal cloud platform during runtime of a tenant host operating system (OS) using an out-of-bound process. This innovation enables cloud service providers to deploy uCode hot-patches to bare metal servers for persistent storage and live-patch without touching the tenant operating system environment.
摘要:
An apparatus and method are described for detecting and correcting data fetch errors within a processor core. For example, one embodiment of an instruction processing apparatus for detecting and recovering from data fetch errors comprises: at least one processor core having a plurality of instruction processing stages including a data fetch stage and a retirement stage; and error processing logic in communication with the processing stages to perform the operations of: detecting an error associated with data in response to a data fetch operation performed by the data fetch stage; and responsively performing one or more operations to ensure that the error does not corrupt an architectural state of the processor core within the retirement stage.
摘要:
A selectively upgradeable disaggregated server is generally described herein. An example modular server unit, the modular server unit includes a processor module coupled to an input/output (I/O) module via a connector. The processor module to communicate with the I/O module via the connector to store and retrieve data. The processor module is a separate hardware unit from the I/O module.
摘要:
Methods and apparatus for highly available rack management in Rack Scale environment. Rack Management Modules (RMMs) are configured to manage power and thermal zones in a rack including a plurality of pooled system drawers, wherein each pooled system drawer is associated with a respective power zone including power sensors and power control devices and a respective thermal zone including thermal sensors and thermal devices. During operation, one of the RMMs is implemented as a master RMM, and the other is implemented as a slave RMM. The master RMM is used to monitor the power and thermal zones. State information is periodically synchronized between the master RMM and the slave RMM. The RMMs are further configured to perform a fail-over operation in connection with a failed or failing RMM, where after the fail-over operation the slave becomes the new master RMM and the previous master RMM becomes the new slave.