-
公开(公告)号:US12253921B2
公开(公告)日:2025-03-18
申请号:US18300642
申请日:2023-04-14
Applicant: Huawei Technologies Co., Ltd.
Inventor: Da Qi Ren , Liang Peng
Abstract: A lockstep controller operates a lockstep system of three or more CPU-GPU pairs, comparing the outputs from the CPU-GPU pairs and, by way of a majority vote, provides the output for the lockstep system. Based on comparing the outputs, if one of the CPU-GPU pairs provides outputs that disagree with the majority outputs, it can be switched out of the lockstep system. The removed CPU is replaced by a backup CPU. So that the backup CPU can be part of a CPU-GPU pair, a portion of the address space from the GPU of one of the other CPU-GPU pairs is assigned to the backup CPU to operate as a replacement CPU-GPU pair, while the CPU already associated with this GPU retains another portion of the GPU's address space to continue operating as a CPU-GPU pair.
-
公开(公告)号:US12197290B2
公开(公告)日:2025-01-14
申请号:US18071459
申请日:2022-11-29
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Da Qi Ren , Liang Peng
Abstract: A fault tolerant processing environment wherein multiple processors are configured as worker nodes and redundant nodes, with a failed worker node replaced programmatically by a manager node. Each of the processing nodes may include a processor and memory associated with the processor and communicate with other processing nodes using a network. A manager node creates a message passing interface (MPI) communication group having worker nodes and redundant nodes, instructs the worker nodes to perform lockstep processing of tasks for an application, and monitors execution of the tasks. If a node fails, the manager node creates a replacement worker node from one of the redundant processing nodes and creates a new communications group. It then instructs those nodes in the new communications group to resume processing based on the application state and checkpoint backup data.
-
公开(公告)号:US20230251941A1
公开(公告)日:2023-08-10
申请号:US18300642
申请日:2023-04-14
Applicant: Huawei Technologies Co., Ltd.
Inventor: Da Qi Ren , Liang Peng
CPC classification number: G06F11/184 , G06F11/2033
Abstract: A lockstep controller operates a lockstep system of three or more CPU-GPU pairs, comparing the outputs from the CPU-GPU pairs and, by way of a majority vote, provides the output for the lockstep system. Based on comparing the outputs, if one of the CPU-GPU pairs provides outputs that disagree with the majority outputs, it can be switched out of the lockstep system. The removed CPU is replaced by a backup CPU. So that the backup CPU can be part of a CPU-GPU pair, a portion of the address space from the GPU of one of the other CPU-GPU pairs is assigned to the backup CPU to operate as a replacement CPU-GPU pair, while the CPU already associated with this GPU retains another portion of the GPU's address space to continue operating as a CPU-GPU pair.
-
公开(公告)号:US20230092343A1
公开(公告)日:2023-03-23
申请号:US18071459
申请日:2022-11-29
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Da Qi Ren , Liang Peng
IPC: G06F11/14
Abstract: A fault tolerant processing environment wherein multiple processors are configured as worker nodes and redundant nodes, with a failed worker node replaced programmatically by a manager node. Each of the processing nodes may include a processor and memory associated with the processor and communicate with other processing nodes using a network. A manager node creates a message passing interface (MPI) communication group having worker nodes and redundant nodes, instructs the worker nodes to perform lockstep processing of tasks for an application, and monitors execution of the tasks. If a node fails, the manager node creates a replacement worker node from one of the redundant processing nodes and creates a new communications group. It then instructs those nodes in the new communications group to resume processing based on the application state and checkpoint backup data.
-
公开(公告)号:US12282429B2
公开(公告)日:2025-04-22
申请号:US17944031
申请日:2022-09-13
Applicant: Huawei Technologies Co., Ltd.
Inventor: Elnaz Ebrahimi , Ehsan Khish Ardestani Zadeh , Wei-Yu Chen , Liang Peng
IPC: G06F12/0862 , G06F12/0811
Abstract: An apparatus includes a processor core and a memory hierarchy. The memory hierarchy includes main memory and one or more caches between the main memory and the processor core. A plurality of hardware pre-fetchers are coupled to the memory hierarchy and a pre-fetch control circuit is coupled to the plurality of hardware pre-fetchers. The pre-fetch control circuit is configured to compare changes in one or more cache performance metrics over two or more sampling intervals and control operation of the plurality of hardware pre-fetchers in response to a change in one or more performance metrics between at least a first sampling interval and a second sampling interval.
-
公开(公告)号:US20230022190A1
公开(公告)日:2023-01-26
申请号:US17944031
申请日:2022-09-13
Applicant: Huawei Technologies Co., Ltd.
Inventor: Elnaz Ebrahimi , Ehsan Khish Ardestani Zadeh , Wei-Yu Chen , Liang Peng
IPC: G06F12/0862 , G06F12/0811
Abstract: An apparatus includes a processor core and a memory hierarchy. The memory hierarchy includes main memory and one or more caches between the main memory and the processor core. A plurality of hardware pre-fetchers are coupled to the memory hierarchy and a pre-fetch control circuit is coupled to the plurality of hardware pre-fetchers. The pre-fetch control circuit is configured to compare changes in one or more cache performance metrics over two or more sampling intervals and control operation of the plurality of hardware pre-fetchers in response to a change in one or more performance metrics between at least a first sampling interval and a second sampling interval.
-
-
-
-
-