Abstract:
An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). The first partition includes a host processor that executes an exception handler for managing reported errors. A message converter unit of the second partition assists in generating messages based on detected errors in the second partition. The message converter unit receives requests from hardware components of the second partition for handling errors and translates MCA addresses between the first partition and the second partition. To support the message converter unit, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor, and the host processor sends address translation information.
Abstract:
An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). The first partition includes a host processor that executes an exception handler for managing reported errors. A message converter unit of the second partition assists in generating messages based on detected errors in the second partition. The message converter unit receives requests from hardware components of the second partition for handling errors and translates MCA addresses between the first partition and the second partition. To support the message converter unit, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor, and the host processor sends address translation information.
Abstract:
An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). When a host processor in the first partition detects an error that requires information from processor cores of the second partition, the host processor generates an access request with a target address pointing to a storage location in a memory of the second partition, not the first partition. When the host processor receives the requested error log information from the second partition, the host processor completes processing of the error. To support the host processor in generating the target address for the access request, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor.
Abstract:
Systems, apparatuses, and methods for implementing a hardware enforcement mechanism to enable platform-specific firmware visibility into an error state ahead of the operating system are disclosed. A system includes at least one or more processor cores, control logic, a plurality of registers, platform-specific firmware, and an operating system (OS). The control logic allows the platform-specific firmware to decide if and when the error state is visible to the OS. In some cases, the platform-specific firmware blocks the OS from accessing the error state. In other cases, the platform-specific firmware allows the OS to access the error state such as when the OS needs to unmap a page. The control logic enables the platform-specific firmware, rather than the OS, to make decisions about the replacement of faulty components in the system.
Abstract:
A method of partitioning a data cache comprising a plurality of sets, the plurality of sets comprising a plurality of ways, is provided. Responsive to a stack data request, the method stores a cache line associated with the stack data in one of a plurality of designated ways of the data cache, wherein the plurality of designated ways is configured to store all requested stack data.
Abstract:
Hard errors in the memory array can be detected and corrected in real-time using reusable entries in an error status buffer. Data may be rewritten to a portion of a memory array and a register in response to a first error in data read from the portion of the memory array. The rewritten data may then be written from the register to an entry of an error status buffer in response to the rewritten data read from the register differing from the rewritten data read from the portion of the memory array.
Abstract:
A method of managing memory includes installing a first cacheline at a first location in a cache memory and receiving a write request. In response to the write request, the first cacheline is modified in accordance with the write request and marked as dirty. Also in response to the write request, a second cacheline is installed that duplicates the first cacheline, as modified in accordance with the write request, at a second location in the cache memory.
Abstract:
A method and apparatus for predicting and managing a fault in memory includes detecting an error in data. The error is compared to one or more stored errors in a filter, and based upon the comparison, the error is predicted as a transient error or a permanent error for further action.
Abstract:
An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). When a host processor in the first partition detects an error that requires information from processor cores of the second partition, the host processor generates an access request with a target address pointing to a storage location in a memory of the second partition, not the first partition. When the host processor receives the requested error log information from the second partition, the host processor completes processing of the error. To support the host processor in generating the target address for the access request, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor.
Abstract:
A method and system for memory attack mitigation in a memory device includes receiving, at a memory controller, an allocation of a page in memory. One or more device controllers detects an aggressor-victim set within the memory. Based upon the detection, an address of the allocated page is identified for further action.