Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.
Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.
Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.
Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.
Abstract:
Systems and methods for identifying candidate load instructions for prefetch operations based on at least instruction encoding of the load instructions, include an identifier based on a function of at least one or more fields of a load instruction and optionally, a subset of bits of the PC value of the load instruction, wherein the one or more fields exclude a full address or program counter (PC) value of the load instruction. Prefetch mechanisms, including a prefetch table indexed by the identifier, can determine whether the load instruction is a candidate load instruction for prefetching load data, based on the identifier. The function may be a hash, a concatenation, or a combination thereof, of one or more bits of the one or more fields. The fields include one or more of a base register, a destination register, an immediate offset, an offset register, or other bits of instruction encoding of the load instruction.
Abstract:
A microprocessor includes a cache memory and a data prefetcher. The data prefetcher detects a pattern of memory accesses within a first memory block and prefetch into the cache memory cache lines from the first memory block based on the pattern. The data prefetcher also observes a new memory access request to a second memory block. The data prefetcher also determines that the first memory block is virtually adjacent to the second memory block and that the pattern, when continued from the first memory block to the second memory block, predicts an access to a cache line implicated by the new request within the second memory block. The data prefetcher also responsively prefetches into the cache memory cache lines from the second memory block based on the pattern.
Abstract:
Systems and methods for performing on-the-fly format conversion on data vectors during load/store operations are described herein. In one embodiment, a method for loading a data vector from a memory into a vector unit comprises reading a plurality of samples from the memory, wherein the plurality of samples are packed in the memory. The method also comprises unpacking the samples to obtain a plurality of unpacked samples, performing format conversion on the unpacked samples in parallel, and sending at feast a portion of the format-converted samples to the vector unit.
Abstract:
A processing device implementing store address prediction for memory disambiguation in a processing device is disclosed. A processing device of the disclosure includes a store address predictor to predict an address for store operations that store data to a memory hierarchy. The processing device further includes a store buffer for buffering the store operations prior to completion, the store buffer to comprise the predicted address for each of the store operations. The processing device further includes a load buffer to buffer a load operation, the load operation to reference the store buffer to, based on the predicted addresses, determine whether to speculatively execute ahead of each store operation and to determine whether to speculatively forward data from one of the store operations.