Transient fault detection by integrating an SRMT code and a non SRMT code in a single application

    公开(公告)号:US07937620B2

    公开(公告)日:2011-05-03

    申请号:US11745403

    申请日:2007-05-07

    申请人: Cheng Wang Youfeng Wu

    发明人: Cheng Wang Youfeng Wu

    IPC分类号: G06F11/00 G06F11/14

    CPC分类号: G06F11/1497 G06F8/457

    摘要: Disclosed is a method for running a first code generated by a Software-based Redundant Multi-Threading (SRMT) compiler along with a second code generated by a normal compiler at runtime, the first code including a first function and a second function, the second code including a third function. The method comprises running the first function in a leading thread and a tailing thread (104); running the third function in a single thread (106), the leading thread calls the third function and running the second function in the leading thread and the tailing thread (108), the third function calls the second function. The present disclosure provides a mechanism for handling function calls wherein SRMT functions and binary functions can call each other irrespective of whether the callee function is a SRMT function or a binary function and thereby dynamically adjusts reliability and performance tradeoff based on run-time information and user selectable policies.

    Methods and apparatus to form a transactional objective instruction construct from lock-based critical sections
    12.
    发明授权
    Methods and apparatus to form a transactional objective instruction construct from lock-based critical sections 有权
    从基于锁的关键部分形成事务性目标指令构造的方法和装置

    公开(公告)号:US07844946B2

    公开(公告)日:2010-11-30

    申请号:US11535205

    申请日:2006-09-26

    申请人: Youfeng Wu Cheng Wang

    发明人: Youfeng Wu Cheng Wang

    IPC分类号: G06F9/44 G06F9/46

    CPC分类号: G06F9/466 G06F9/524

    摘要: Methods and an apparatus for forming a transaction object instruction construct are provided. An example method translates a source instruction construct to form a transactional objective instruction construct, executes the transactional objective instruction construct, intercepts an aborted transaction associated with the transactional objective instruction construct during execution, maintains a graph of nodes and edges associated with the executed transactional objective instruction construct to predict a deadlock situation, and resolves the deadlock situation associated with the transactional objective instruction construct based on the graph.

    摘要翻译: 提供了用于形成交易对象指令结构的方法和装置。 一个示例性方法将源指令结构转换成一个事务性目标指令结构,执行事务目标指令结构,在执行期间拦截与事务性目标指令结构相关联的异常事务,维护与执行的事务目标相关联的节点和边的图 指令结构来预测死锁情况,并根据图表解决与事务性目标指令构造相关的死锁情况。

    COMPACT TRACE TREES FOR DYNAMIC BINARY PARALLELIZATION
    14.
    发明申请
    COMPACT TRACE TREES FOR DYNAMIC BINARY PARALLELIZATION 有权
    用于动态二进制并行化的紧凑跟踪

    公开(公告)号:US20100083236A1

    公开(公告)日:2010-04-01

    申请号:US12242371

    申请日:2008-09-30

    IPC分类号: G06F9/44

    CPC分类号: G06F9/45516

    摘要: Methods and apparatus relating to compact trace trees for dynamic binary parallelization are described. In one embodiment, a compact trace tree (CTT) is generated to improve the effectiveness of dynamic binary parallelization. CTT may be used to determine which traces are to be duplicated and specialized for execution on separate processing elements. Other embodiments are also described and claimed.

    摘要翻译: 描述了用于动态二进制并行化的紧凑跟踪树的方法和设备。 在一个实施例中,生成紧凑跟踪树(CTT)以提高动态二进制并行化的有效性。 可以使用CTT来确定哪些跟踪被复制并专用于在单独的处理元件上执行。 还描述和要求保护其他实施例。

    Energy/performance with optimal communication in dynamic parallelization of single threaded programs

    公开(公告)号:US09715376B2

    公开(公告)日:2017-07-25

    申请号:US12344721

    申请日:2008-12-29

    申请人: Cheng Wang Youfeng Wu

    发明人: Cheng Wang Youfeng Wu

    IPC分类号: G06F9/44 G06F9/45

    摘要: A method and apparatus for optimizing parallelized single threaded programs is herein described. Code regions, such as dependency chains, are replicated utilizing any known method, such as dynamic code replication. A flow network associated with a replicated code region is built and a minimum cut algorithm is applied to determine duplicated nodes, which may include a single instruction or a group of instructions, to be removed. The dependency of removed nodes is fulfilled with inserted communication to ensure proper data consistency of the original single-threaded program. As a result, both performance and power consumption is optimized for parallel code sections through removal of expensive workload nodes and replacement with communication between other replicated code regions to be executed in parallel.

    Bi-directional copying of register content into shadow registers
    17.
    发明授权
    Bi-directional copying of register content into shadow registers 有权
    将寄存器内容双向复制到影子寄存器中

    公开(公告)号:US09292221B2

    公开(公告)日:2016-03-22

    申请号:US13995943

    申请日:2011-09-29

    IPC分类号: G06F3/06 G06F9/30 G06F9/38

    摘要: Embodiments of the present disclosure describe a processor, which may include copy circuitry coupled to a shadow register file and a control register. The copy circuitry may be configured to copy content from a range of a number of registers to a shadow range of the shadow register file in a forward or backward direction. The forward or backward direction may be based at least in part on a value stored in the control register.

    摘要翻译: 本公开的实施例描述了一种处理器,其可以包括耦合到影子寄存器文件和控制寄存器的复制电路。 复制电路可以被配置为将内容从多个寄存器的范围向前或向后复制到影子寄存器文件的阴影范围。 前进或后退方向可以至少部分地基于存储在控制寄存器中的值。

    OVERLAPPING ATOMIC REGIONS IN A PROCESSOR
    18.
    发明申请
    OVERLAPPING ATOMIC REGIONS IN A PROCESSOR 有权
    在处理者中重写原始地区

    公开(公告)号:US20140122845A1

    公开(公告)日:2014-05-01

    申请号:US13993364

    申请日:2011-12-30

    IPC分类号: G06F9/38

    摘要: In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed.

    摘要翻译: 在一个实施例中,本发明包括具有执行指令的核心的处理器。 该核心可以包括能够以重叠的方式执行不同原子区域的指令的各种结构和逻辑。 为此,核心可以包括具有用于存储用于执行指令的数据的寄存器的寄存器文件,以及每个在给定原子区域的启动时存储寄存器检查点的多个影子寄存器文件。 以这种方式,可以发生由程序员或编译器识别的原子区域的重叠执行。 描述和要求保护其他实施例。

    FLEXIBLE ACCELERATION OF CODE EXECUTION
    19.
    发明申请
    FLEXIBLE ACCELERATION OF CODE EXECUTION 有权
    代码执行的灵活加速

    公开(公告)号:US20140096132A1

    公开(公告)日:2014-04-03

    申请号:US13631408

    申请日:2012-09-28

    申请人: Cheng Wang Youfeng Wu

    发明人: Cheng Wang Youfeng Wu

    IPC分类号: G06F9/455 G06F9/00

    摘要: Technologies for performing flexible code acceleration on a computing device includes initializing an accelerator virtual device on the computing device. The computing device allocates memory-mapped input and output (I/O) for the accelerator virtual device and also allocates an accelerator virtual device context for a code to be accelerated. The computing device accesses a bytecode of the code to be accelerated and determines whether the bytecode is an operating system-dependent bytecode. If not, the computing device performs hardware acceleration of the bytecode via the memory-mapped I/O using an internal binary translation module. However, if the bytecode is operating system-dependent, the computing device performs software acceleration of the bytecode.

    摘要翻译: 在计算设备上执行灵活代码加速的技术包括在计算设备上初始化加速器虚拟设备。 计算设备为加速器虚拟设备分配内存映射输入和输出(I / O),并为加速的代码分配加速器虚拟设备上下文。 计算设备访问要加速的代码的字节码,并确定字节码是否是依赖于操作系统的字节码。 如果不是,计算设备通过使用内部二进制翻译模块的内存映射I / O执行字节码的硬件加速。 但是,如果字节码与操作系统有关,则计算设备执行字节码的软件加速。

    Compact trace trees for dynamic binary parallelization
    20.
    发明授权
    Compact trace trees for dynamic binary parallelization 有权
    用于动态二进制并行化的紧凑跟踪树

    公开(公告)号:US08332558B2

    公开(公告)日:2012-12-11

    申请号:US12242371

    申请日:2008-09-30

    IPC分类号: G06F9/44 G06F9/00

    CPC分类号: G06F9/45516

    摘要: Methods and apparatus relating to compact trace trees for dynamic binary parallelization are described. In one embodiment, a compact trace tree (CTT) is generated to improve the effectiveness of dynamic binary parallelization. CTT may be used to determine which traces are to be duplicated and specialized for execution on separate processing elements. Other embodiments are also described and claimed.

    摘要翻译: 描述了用于动态二进制并行化的紧凑跟踪树的方法和设备。 在一个实施例中,生成紧凑跟踪树(CTT)以提高动态二进制并行化的有效性。 可以使用CTT来确定哪些跟踪被复制并专用于在单独的处理元件上执行。 还描述和要求保护其他实施例。