Method and apparatus for selective and power-aware memory error protection and memory management

    公开(公告)号:US10141955B2

    公开(公告)日:2018-11-27

    申请号:US14684368

    申请日:2015-04-11

    IPC分类号: H03M13/35 G06F11/10 H03M13/51

    摘要: A method for providing selective memory error protection responsive to a predictable failure notification associated with at least one portion of a memory in a computing system includes: obtaining an active error correcting code (ECC) configuration corresponding to the portion of the memory; determining whether the active ECC configuration is sufficient to correct at least one error in the portion of the memory affected by the predictable failure notification; when the active ECC configuration is insufficient to correct the error, determining whether data corruption can be tolerated by an application running on the computing system; when data corruption cannot be tolerated by the application, determining whether a stronger ECC level is available and, if a stronger ECC level is available, increasing a strength of the active ECC configuration; and when data corruption can be tolerated, performing page reassignment and aggregation of non-critical data.

    Checkpoint triggering in a computer system

    公开(公告)号:US10089181B2

    公开(公告)日:2018-10-02

    申请号:US15194884

    申请日:2016-06-28

    发明人: Chen-Yong Cher

    IPC分类号: G06F11/14 G06F11/30 G06F11/34

    摘要: According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node and determining whether it is time to read a monitor associated with a metric of the task. The monitor is read to determine a value of the metric based on determining that it is time to read the monitor. A threshold for triggering creation of the checkpoint is determined based on the metric. A monitoring block size is determined for the checkpoint. A checkpoint interval is determined based on the monitoring block size, a checkpoint bandwidth, and a failure rate of the computer system. Based on determining that the value of the metric has crossed the threshold and the checkpoint interval has elapsed, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation.

    OPTIMIZATION OF APPLICATION WORKFLOW IN MOBILE EMBEDDED DEVICES
    4.
    发明申请
    OPTIMIZATION OF APPLICATION WORKFLOW IN MOBILE EMBEDDED DEVICES 审中-公开
    移动嵌入式设备中应用工作流优化

    公开(公告)号:US20160378550A1

    公开(公告)日:2016-12-29

    申请号:US14950934

    申请日:2015-11-24

    IPC分类号: G06F9/48 G06F9/54

    摘要: An aspect includes optimizing an application workflow. The optimizing includes characterizing the application workflow by determining at least one baseline metric related to an operational control knob of an embedded system processor. The application workflow performs a real-time computational task encountered by at least one mobile embedded system of a wirelessly connected cluster of systems supported by a server system. The optimizing of the application workflow further includes performing an optimization operation on the at least one baseline metric of the application workflow while satisfying at least one runtime constraint. An annotated workflow that is the result of performing the optimization operation is output.

    摘要翻译: 一个方面包括优化应用程序工作流程。 优化包括通过确定与嵌入式系统处理器的操作控制旋钮相关的至少一个基准度量来表征应用程序工作流程。 应用程序工作流执行由服务器系统支持的无线连接的系统集群的至少一个移动嵌入式系统遇到的实时计算任务。 应用程序工作流的优化还包括对满足至少一个运行时约束的应用工作流的至少一个基准度量执行优化操作。 输出作为执行优化操作的结果的注释工作流。

    SILENT STORE DETECTION AND RECORDING IN MEMORY STORAGE
    5.
    发明申请
    SILENT STORE DETECTION AND RECORDING IN MEMORY STORAGE 有权
    存储存储中的静态存储检测和记录

    公开(公告)号:US20160378367A1

    公开(公告)日:2016-12-29

    申请号:US14749680

    申请日:2015-06-25

    IPC分类号: G06F3/06

    摘要: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.

    摘要翻译: 一方面包括接收包括存储器地址和写数据的写请求。 从存储器地址的存储器位置读取存储的数据。 基于确定存储器位置未被修改,将存储的数据与写入数据进行比较。 基于与写入数据匹配的存储数据,写入请求完成,而不将写入数据写入存储器,并且在静默存储位图中设置相应的静默存储位。 基于与写入数据不匹配的存储数据,将写入数据写入存储器位置,无声存储位被复位并且相应的修改位被置位。 为应用程序和操作系统中的至少一个提供对静默存储位图的访问。

    CHECKPOINTING FOR A HYBRID COMPUTING NODE
    7.
    发明申请
    CHECKPOINTING FOR A HYBRID COMPUTING NODE 有权
    检查混合计算节点

    公开(公告)号:US20150363225A1

    公开(公告)日:2015-12-17

    申请号:US14302921

    申请日:2014-06-12

    发明人: Chen-Yong Cher

    IPC分类号: G06F9/48

    摘要: According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.

    摘要翻译: 根据一方面,一种用于在混合计算节点中进行检查点的方法包括在所述混合计算节点的处理加速器中执行任务。 在处理加速器的本地存储器中创建一个检查点。 检查点包括在重新启动操作时重新执行处理加速器中的任务的状态数据。 创建检查点后,在处理加速器中恢复执行任务。 当处理加速器执行任务时,检查点的状态数据从处理加速器传送到混合计算节点的主处理器。

    Malicious activity detection of a functional unit

    公开(公告)号:US09172714B2

    公开(公告)日:2015-10-27

    申请号:US14012237

    申请日:2013-08-28

    摘要: A mechanism is provided for detecting malicious activity in a functional unit of a data processing system. A set of activity values associated with a set of functional units and a set of thermal levels associated with the set of functional units are monitored. For a current activity value associated with the functional unit in the set of functional units, a determination is made as to whether a thermal level associated with the functional unit differs from a verified thermal level beyond a predetermined threshold. Responsive to the thermal level associated with the functional unit differing from the verified thermal level beyond the predetermined threshold, sending an indication of suspected abnormal activity associated with the given functional unit.

    METHODS, APPARATUS AND SYSTEM FOR SELECTIVE DUPLICATION OF SUBTASKS
    9.
    发明申请
    METHODS, APPARATUS AND SYSTEM FOR SELECTIVE DUPLICATION OF SUBTASKS 有权
    方法,选择性重复次数的装置和系统

    公开(公告)号:US20150227426A1

    公开(公告)日:2015-08-13

    申请号:US14176083

    申请日:2014-02-08

    IPC分类号: G06F11/14 G06F11/00

    摘要: A method for selective duplication of subtasks in a high-performance computing system includes: monitoring a health status of one or more nodes in a high-performance computing system, where one or more subtasks of a parallel task execute on the one or more nodes; identifying one or more nodes as having a likelihood of failure which exceeds a first prescribed threshold; selectively duplicating the one or more subtasks that execute on the one or more nodes having a likelihood of failure which exceeds the first prescribed threshold; and notifying a messaging library that one or more subtasks were duplicated.

    摘要翻译: 一种用于在高性能计算系统中选择性地复制子任务的方法包括:监视高性能计算系统中的一个或多个节点的健康状态,其中并行任务的一个或多个子任务在所述一个或多个节点上执行; 将一个或多个节点识别为具有超过第一规定阈值的故障可能性; 选择性地复制在具有超过第一规定阈值的故障可能性的一个或多个节点上执行的一个或多个子任务; 并通知消息传递库一个或多个子任务被复制。