Abstract:
Embodiments may provide a method for performing a replay of a previous execution of a program. The method includes generating an order of recorded chunks of instructions across a plurality of recorded threads based, at least in part, on log files generated from the previous execution of the program. The method includes initiating execution of the program, the executing program having a plurality of threads, each thread having a number of chunks of instructions. The method includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk before the instruction is executed. The method includes determining, by a replay module executing on the processor, that the chunk is an active chunk if the chunk is currently in line for execution according to the order of recorded chunks, and responsive to a determination that the chunk is the active chunk, executing the instruction.
Abstract:
In accordance with the present description, cache operations for a memory-sided cache in front of a backing memory such as a byte-addressable non-volatile memory, include combining at least two of a first operation, a second operation and a third operation, wherein the first operation includes evicting victim cache entries from the cache memory in accordance with a replacement policy which is biased to evict cache entries having clean cache lines over evicting cache entries having dirty cache lines. The second operation includes evicting victim cache entries from the primary cache memory to a victim cache memory of the cache memory, and the third operation includes translating memory location addresses to shuffle and spread the memory location addresses within an address range of the backing memory. It is believed that various combinations of these operations may provide improved operation of a memory. Other aspects are described herein.
Abstract:
A processor includes a first core to execute a first software thread, a second core to execute a second software thread, and shared memory access monitoring and recording logic. The logic includes memory access monitor logic to monitor accesses to memory by the first thread, record memory addresses of the monitored accesses, and detect data races involving the recorded memory addresses with other threads. The logic includes chunk generation logic is to generate chunks to represent committed execution of the first thread. Each of the chunks is to include a number of instructions of the first thread executed and committed and a time stamp. The chunk generation logic is to stop generation of a current chunk in response to detection of a data race by the memory access monitor logic. A chunk buffer is to temporarily store chunks until the chunks are transferred out of the processor.
Abstract:
One or more embodiments may provide a method for performing a replay. The method includes initiating execution of a program, the program having a plurality of sets of instructions, and each set of instructions has a number of chunks of instructions. The method also includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk of the number of chunks before execution. The method further includes determining, by a replay module executing on the processor, whether the chunk is an active chunk, and responsive to the chunk being the active chunk, executing the instruction.
Abstract:
One or more embodiments may provide a method for performing a replay. The method includes initiating execution of a program, the program having a plurality of sets of instructions, and each set of instructions has a number of chunks of instructions. The method also includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk of the number of chunks before execution. The method further includes determining, by a replay module executing on the processor, whether the chunk is an active chunk, and responsive to the chunk being the active chunk, executing the instruction.
Abstract:
A system, processor, and method to record the interleavings of shared memory accesses in the presence of complex multi-operation instructions. An extension to instruction atomicity (IA) is disclosed that makes it possible for software to infer partial information about a multi-operation execution if the hardware has recorded a dependency due to an instruction atomicity violation (IAV). By monitoring the progress of a multi-operation instruction, the need for complex multi-operation emulation is unnecessary.
Abstract:
A system, processor, and method to record the interleavings of shared memory accesses in the presence of complex multi-operation instructions. An extension to instruction atomicity (IA) is disclosed that makes it possible for software to infer partial information about a multi-operation execution if the hardware has recorded a dependency due to an instruction atomicity violation (IAV). By monitoring the progress of a multi-operation instruction, the need for complex multi-operation emulation is unnecessary.