METHOD AND APPARATUS TO PROACTIVELY SCREEN HARDWARE ERRORS OF A COMPUTER PROCESSING SYSTEM

    公开(公告)号:US20250004896A1

    公开(公告)日:2025-01-02

    申请号:US18217245

    申请日:2023-06-30

    Abstract: Methods and apparatus to implement proactive hardware error screening are disclosed. In one embodiment, a computer processing system includes a plurality of computational units to execute tasks for one or more applications; a plurality of sensors collects measurement data of the plurality of computational units, to collect measurement data of the plurality of computational units; a data structure indicating hardware health statuses of the plurality of computational units determined based on the measurement data is stored in a storage; and the plurality of computational units is scheduled to perform task execution on the computer processing system for the one or more applications based on the hardware health statuses of the plurality of computational units indicated in the data structure, wherein a first computational unit is excluded from the task execution when a corresponding first hardware health status of the first computational unit indicates an impending hardware failure.

Patent Agency Ranking