Remote scalable machine check architecture

    公开(公告)号:US12111719B2

    公开(公告)日:2024-10-08

    申请号:US17854788

    申请日:2022-06-30

    CPC classification number: G06F11/0787 G06F11/0709 G06F11/0721

    Abstract: An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). The first partition includes a host processor that executes an exception handler for managing reported errors. A message converter unit of the second partition assists in generating messages based on detected errors in the second partition. The message converter unit receives requests from hardware components of the second partition for handling errors and translates MCA addresses between the first partition and the second partition. To support the message converter unit, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor, and the host processor sends address translation information.

    REMOTE SCALABLE MACHINE CHECK ARCHITECTURE
    12.
    发明公开

    公开(公告)号:US20240004750A1

    公开(公告)日:2024-01-04

    申请号:US17854788

    申请日:2022-06-30

    CPC classification number: G06F11/0793 G06F11/0709 G06F11/0721

    Abstract: An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). The first partition includes a host processor that executes an exception handler for managing reported errors. A message converter unit of the second partition assists in generating messages based on detected errors in the second partition. The message converter unit receives requests from hardware components of the second partition for handling errors and translates MCA addresses between the first partition and the second partition. To support the message converter unit, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor, and the host processor sends address translation information.

    SCALABLE MACHINE CHECK ARCHITECTURE
    13.
    发明公开

    公开(公告)号:US20240004744A1

    公开(公告)日:2024-01-04

    申请号:US17854710

    申请日:2022-06-30

    CPC classification number: G06F11/0772 G06F11/0787 G06F11/1405 G06F12/0292

    Abstract: An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). When a host processor in the first partition detects an error that requires information from processor cores of the second partition, the host processor generates an access request with a target address pointing to a storage location in a memory of the second partition, not the first partition. When the host processor receives the requested error log information from the second partition, the host processor completes processing of the error. To support the host processor in generating the target address for the access request, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor.

    PLATFORM FIRST ERROR HANDLING
    14.
    发明申请

    公开(公告)号:US20190303230A1

    公开(公告)日:2019-10-03

    申请号:US15940693

    申请日:2018-03-29

    Abstract: Systems, apparatuses, and methods for implementing a hardware enforcement mechanism to enable platform-specific firmware visibility into an error state ahead of the operating system are disclosed. A system includes at least one or more processor cores, control logic, a plurality of registers, platform-specific firmware, and an operating system (OS). The control logic allows the platform-specific firmware to decide if and when the error state is visible to the OS. In some cases, the platform-specific firmware blocks the OS from accessing the error state. In other cases, the platform-specific firmware allows the OS to access the error state such as when the OS needs to unmap a page. The control logic enables the platform-specific firmware, rather than the OS, to make decisions about the replacement of faulty components in the system.

    Detecting and correcting hard errors in a memory array
    16.
    发明授权
    Detecting and correcting hard errors in a memory array 有权
    检测和纠正存储器阵列中的硬错误

    公开(公告)号:US09189326B2

    公开(公告)日:2015-11-17

    申请号:US14048830

    申请日:2013-10-08

    Abstract: Hard errors in the memory array can be detected and corrected in real-time using reusable entries in an error status buffer. Data may be rewritten to a portion of a memory array and a register in response to a first error in data read from the portion of the memory array. The rewritten data may then be written from the register to an entry of an error status buffer in response to the rewritten data read from the register differing from the rewritten data read from the portion of the memory array.

    Abstract translation: 可以使用错误状态缓冲区中的可重用条目实时检测和校正存储器阵列中的硬错误。 响应于从存储器阵列的部分读取的数据中的第一个错误,数据可以重写到存储器阵列和寄存器的一部分。 然后可以将重写的数据从寄存器写入错误状态缓冲器的条目,以响应于从寄存器读取的重写数据与从存储器阵列的部分读取的重写数据不同。

    DIRTY CACHELINE DUPLICATION
    17.
    发明申请

    公开(公告)号:US20140173379A1

    公开(公告)日:2014-06-19

    申请号:US13720536

    申请日:2012-12-19

    CPC classification number: G06F11/1064 G06F12/0893

    Abstract: A method of managing memory includes installing a first cacheline at a first location in a cache memory and receiving a write request. In response to the write request, the first cacheline is modified in accordance with the write request and marked as dirty. Also in response to the write request, a second cacheline is installed that duplicates the first cacheline, as modified in accordance with the write request, at a second location in the cache memory.

    Abstract translation: 管理存储器的方法包括在高速缓冲存储器中的第一位置安装第一高速缓存线并接收写入请求。 响应于写入请求,第一个缓存线根据写入请求进行修改并标记为脏。 还响应于写入请求,安装第二高速缓存线,该第二高速缓存线在高速缓冲存储器的第二位置处复制根据写入请求修改的第一高速缓存线。

    Scalable machine check architecture

    公开(公告)号:US12072756B2

    公开(公告)日:2024-08-27

    申请号:US17854710

    申请日:2022-06-30

    CPC classification number: G06F11/0772 G06F11/0787 G06F11/1405 G06F12/0292

    Abstract: An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). When a host processor in the first partition detects an error that requires information from processor cores of the second partition, the host processor generates an access request with a target address pointing to a storage location in a memory of the second partition, not the first partition. When the host processor receives the requested error log information from the second partition, the host processor completes processing of the error. To support the host processor in generating the target address for the access request, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor.

Patent Agency Ranking