摘要:
System and methods of selectively managing errors in memory modules. In an exemplary implementation, a method may include monitoring for persistent errors in the memory modules. The methods may also include mapping at least a portion of the memory modules to a spare memory cache only to obviate persistent errors. The method may also include initiating memory erasure on at least a portion of the memory modules only if insufficient cache lines are available in the spare memory cache.
摘要:
Systems and methods for implementing a stride value for memory are provided. One embodiment includes a system comprising a plurality of memory modules configured to store interleaved data in a plurality of memory storage units according to a predetermined interleave. The plurality of memory storage units can be defined by a memory range of consecutive addresses. The system also comprises a memory test device configured to access a portion of the plurality of memory storage units in a sequence that repeats according to a programmable stride value.
摘要:
Systems and methods for implementing a stride value for memory are provided. One embodiment relates to a system that includes a plurality of memory modules configured to store interleaved data in a plurality of memory storage units according to a predetermined interleave. A memory test device is configured to perform a memory test that accesses a portion of the plurality of memory storage units in a sequence according to a programmable stride value. The memory test device performs the memory test by writing test data to each of the memory storage units in the portion of the plurality of memory storage units and reading the test data from each of the memory storage units in the portion of the plurality of memory storage units.
摘要:
Systems and methods for implementing a stride value for memory are provided. One embodiment relates to a system that includes a plurality of memory modules configured to store interleaved data in a plurality of memory storage units according to a predetermined interleave. A memory test device is configured to perform a memory test that accesses a portion of the plurality of memory storage units in a sequence according to a programmable stride value. The memory test device performs the memory test by writing test data to each of the memory storage units in the portion of the plurality of memory storage units and reading the test data from each of the memory storage units in the portion of the plurality of memory storage units.
摘要:
Systems and methods for implementing a stride valise for memory are provided. One embodiment includes a system comprising a plurality of memory modules configured to store interleaved data in a plurality of memory storage units according to a predetermined interleave. The plurality of memory storage units can be defined by a memory range of consecutive addresses. The system also comprises a memory test device configured to access a portion of the plurality of memory storage units in a sequence that repeats according to a programmable stride value.
摘要:
Method and system of error logging. At least some of the illustrative embodiments are methods including detecting assertion of an error pin by a processor system, (comprising at least a main processor and a chipset, the assertion of the error pin an indication to reboot the processor system) the detecting by a reset circuit, notifying a management processor (distinct from the main processor) that the error pin is asserted (the notifying by the reset circuit), writing to a plurality of registers in the chipset (the writing by the management processor), de-asserting a reset pin of the main processor, and then executing by the main processor an error-handling code to generate an error log.
摘要:
A method of managing errors in a data processing system may involve at least one computer system. Each computer system may include a processor that executes an operating system, firmware, and system memory storing instructions for the operating system. A firmware error handler resident in the firmware may identify an error occurring in the computer system. The firmware error handler may determine whether the operating system is required to take an action in response to the error. If the operating system is not required to take an action in response to the error, the firmware error handler may create an error log accessible to the operating system appropriate to cause the operating system to take no action.
摘要:
Method and system of error logging. At least some of the illustrative embodiments are methods including detecting assertion of an error pin by a processor system, (comprising at least a main processor and a chipset, the assertion of the error pin an indication to reboot the processor system) the detecting by a reset circuit, notifying a management processor (distinct from the main processor) that the error pin is asserted (the notifying by the reset circuit), writing to a plurality of registers in the chipset (the writing by the management processor), de-asserting a reset pin of the main processor, and then executing by the main processor an error-handling code to generate an error log.
摘要:
The present invention flexibly manages the formation of a partition from a plurality of independently executing cells (discrete hardware entities comprising system resources) in preparation for the instantiation of an operating system instance upon the partition. Specifically, the invention manages configuration activities that occur to transition from having individual cells acting independently, and having cells rendezvous, to having cells become interdependent to continue operations as a partition. The invention manages the partitioning forming process such that no single point of failure disrupts the process. Instead, the invention is implemented as a distributed application wherein individual cells independently execute instructions based upon respective copies of the complex profile (a “map” of the complex configuration). Also, the invention adapts to a degree of delay associated with certain cells becoming ready to join the formation or rendezvous process. The invention is able to cope with missing, unavailable, or otherwise malfunctioning cells. Additionally, the invention analyzes present cells to determine their compatibility and reject cells that are not compatible.
摘要:
In accordance with at least some embodiments, a system includes a plurality of partitions, each partition having its own operating system (OS) and workload. The system also includes a plurality of resources assignable to the plurality of partitions. The system also includes management logic coupled to the plurality of partitions and the plurality of resources. The management logic is configured to set priority rules for each of the plurality of partitions based on user input. The management logic performs automated resource fault management for the resources assigned to the plurality of partitions based on the priority rules.