Abstract:
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
Abstract:
A system and method for reducing the number of aborts caused by a runtime helper being called during the execution of a transaction block. When a runtime helper is called during the execution of a transaction block while a program using hardware transactional memory is running, the runtime helper passes ID information indicating the type of runtime helper to an abort handler. When there is an abort caused by a call to a runtime helper, the abort handler responds by acquiring the ID information of the runtime helper that caused the abort, disables the transaction block with respect to a specific type of runtime helper, executes the non-transactional path corresponding to the transaction block, and re-enables the transaction block when predetermined conditions are satisfied.
Abstract:
A system and method for reducing the number of aborts caused by a runtime helper being called during the execution of a transaction block. When a runtime helper is called during the execution of a transaction block while a program using hardware transactional memory is running, the runtime helper passes ID information indicating the type of runtime helper to an abort handler. When there is an abort caused by a call to a runtime helper, the abort handler responds by acquiring the ID information of the runtime helper that caused the abort, disables the transaction block with respect to a specific type of runtime helper, executes the non-transactional path corresponding to the transaction block, and re-enables the transaction block when predetermined conditions are satisfied.
Abstract:
In an embodiment, if a self thread has more than one conflict, a transaction of the self thread is aborted and restarted. If the self thread has only one conflict and an enemy thread of the self thread has more than one conflict, the transaction of the self thread is committed. If the self thread only conflicts with the enemy thread and the enemy thread only conflicts with the self thread and the self thread has a key that has a higher priority than a key of the enemy thread, the transaction of the self thread is committed. If the self thread only conflicts with the enemy thread, the enemy thread only conflicts with the self thread, and the self thread has a key that has a lower priority than the key of the enemy thread, the transaction of the self thread is aborted.
Abstract:
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.
Abstract:
Techniques for switching between two (thread and lane) modes of execution in a dual execution mode processor are provided. In one aspect, a method for executing a single instruction stream having alternating serial regions and parallel regions in a same processor is provided. The method includes the steps of: creating a processor architecture having, for each architected thread of the single instruction stream, one set of thread registers, and N sets of lane registers across N lanes; executing instructions in the serial regions of the single instruction stream in a thread mode against the thread registers; executing instructions in the parallel regions of the single instruction stream in a lane mode against the lane registers; and transitioning execution of the single instruction stream from the thread mode to the lane mode or from the lane mode to the thread mode.
Abstract:
Techniques for switching between two (thread and lane) modes of execution in a dual execution mode processor are provided. In one aspect, a method for executing a single instruction stream having alternating serial regions and parallel regions in a same processor is provided. The method includes the steps of: creating a processor architecture having, for each architected thread of the single instruction stream, one set of thread registers, and N sets of lane registers across N lanes; executing instructions in the serial regions of the single instruction stream in a thread mode against the thread registers; executing instructions in the parallel regions of the single instruction stream in a lane mode against the lane registers; and transitioning execution of the single instruction stream from the thread mode to the lane mode or from the lane mode to the thread mode.
Abstract:
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
Abstract:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
Abstract:
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.