Abstract:
A dynamic runtime scheduling system includes task manager circuitry capable of detecting a correspondence in at least a portion of the output arguments from one or more first tasks with at least a portion of the input arguments to one or more second tasks. Upon detecting the output arguments from the first task represents a superset of the second task input arguments, the task manager circuitry apportions the first task into a plurality of new subtasks. At least one of the new subtasks includes output arguments having a 1:1 correspondence to the second task input arguments. Upon detecting the output arguments from an first task represents a subset of the second task input arguments, the task manager circuitry may autonomously apportion the second task into a plurality of new subtasks. At least one of the new subtasks may include input arguments having a 1:1 correspondence to first task output arguments.
Abstract:
Techniques for improved transactional memory management are described. In one embodiment, for example, an apparatus may comprise a processor element, an execution component for execution by the processor element to concurrently execute a software transaction and a hardware transaction according to a transactional memory process, a tracking component for execution by the processor element to activate a global lock to indicate that the software transaction is undergoing execution, and a finalization component for execution by the processor element to commit the software transaction and deactivate the global lock when execution of the software transaction completes, the finalization component to abort the hardware transaction when the global lock is active when execution of the hardware transaction completes. Other embodiments are described and claimed.
Abstract:
In an embodiment of a transactional memory system, an apparatus includes a processor and an execution logic to enable concurrent execution of at least one first software transaction of a first software transaction mode and a second software transaction of a second software transaction mode and at least one hardware transaction of a first hardware transaction mode and at least one second hardware transaction of a second hardware transaction mode. In one example, the execution logic may be implemented within the processor. Other embodiments are described and claimed.
Abstract:
A processor is described comprising memory access conflict detection circuitry to identify a conflict pertaining to a transaction being executed by a thread that believes it has locked information within a memory. The processor also includes logging circuitry to construct and report out a packet if the memory access conflict detection circuitry identifies a conflict that causes the transaction to be aborted.
Abstract:
A system, processor, and method to record the interleavings of shared memory accesses in the presence of complex multi-operation instructions. An extension to instruction atomicity (IA) is disclosed that makes it possible for software to infer partial information about a multi-operation execution if the hardware has recorded a dependency due to an instruction atomicity violation (IAV). By monitoring the progress of a multi-operation instruction, the need for complex multi-operation emulation is unnecessary.
Abstract:
A system is disclosed that includes a processor and a dynamic random access memory (DRAM). The processor includes a hybrid transactional memory (HyTM) that includes hardware transactional memory (HTM), and a program debugger to replay a program that includes an HTM instruction and that has been executed has been executed using the HyTM. The program debugger includes a software emulator that is to replay the HTM instruction by emulation of the HTM. Other embodiments are disclosed and claimed.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to manage concurrent predicate expressions. An example method discloses inserting a first condition hook into a first thread, the first condition hook associated with a first condition, inserting a second condition hook into a second thread, the second condition hook associated with a second condition, preventing the second thread from executing until the first condition is satisfied, and identifying a concurrency violation when the second condition is satisfied.
Abstract:
A system, processor, and method to record the interleavings of shared memory accesses in the presence of complex multi-operation instructions. An extension to instruction atomicity (IA) is disclosed that makes it possible for software to infer partial information about a multi-operation execution if the hardware has recorded a dependency due to an instruction atomicity violation (IAV). By monitoring the progress of a multi-operation instruction, the need for complex multi-operation emulation is unnecessary.
Abstract:
Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
Abstract:
Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.