Abstract:
A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
Abstract:
A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.
Abstract:
A processor includes a cache hierarchy and an execution unit. The cache hierarchy includes a lower level cache and a higher level cache. The execution unit includes logic to issue a memory operation to access the cache hierarchy. The lower level cache includes logic to determine that a requested cache line of the memory operation is unavailable in the lower level cache, determine that a line fill buffer of the lower level cache is full, and initiate prefetching of the requested cache line from the higher level cache based upon the determination that the line fill buffer of the lower level cache is full. The line fill buffer is to forward miss requests to the higher level cache.
Abstract:
A processor includes one or more execution units to execute instructions, each having one or more elements in different element sizes using one or more registers in different register sizes. The processor further includes a counter configured to count a number of instructions performing predetermined types of operations executed by the one or more execution units. The processor further includes one or more registers to allow an external component to configure the counter to count a number of instructions associated with a combination of a register size and a element size (register/element size) and to retrieve a counter value produced by the counter.
Abstract:
Embodiments of an invention for monitoring the operation of a processor are disclosed. In one embodiment, a system includes a processor and a hardware agent external to the processor. The processor includes virtualization logic to provide for the processor to operate in a root mode and in a non-root mode. The hardware agent is to verify operation of the processor in the non-root mode based on tracing information to be collected by a software agent to be executed by the processor in the root mode.
Abstract:
A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
Abstract:
In one embodiment, a processor comprises a branch predictor to generate, in association with a program loop, a frozen history vector comprising a snapshot of a branch history vector; track a current iteration of the program loop; and provide a prediction for a branch instruction associated with the program loop, the prediction based on the frozen history vector and the current iteration of the program loop.
Abstract:
A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
Abstract:
Embodiments of apparatuses, methods, and systems for misprediction-triggered local history-based branch prediction are described. In one embodiments, an apparatus includes a current pattern table and a local pattern table. The current pattern table has a plurality of entries, each entry in which to store a plurality of pattern lengths of a current pattern of one of a plurality of branch instructions. The local pattern table is to provide a first branch prediction based on the current pattern.
Abstract:
Systems and methods for cache allocation with code and data prioritization. An example system may comprise: a cache; a processing core, operatively coupled to the cache; and a cache control logic, responsive to receiving a cache fill request comprising an identifier of a request type and an identifier of a class of service, to identify a subset of the cache corresponding to a capacity bit mask associated with the request type and the class of service.