摘要:
According to some embodiments, an apparatus having corresponding methods includes a storage module configured to store data and instructions; a first processor pipeline configured to process the data and instructions when the first processor pipeline is selected; a second processor pipeline configured to process the data and instructions when the second processor pipeline is selected; and a selection module configured to select either the first processor pipeline or the second processor pipeline.
摘要:
A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.
摘要:
Aspects of the disclosure provide methods for cache efficiency. A method for cache efficiency can include storing data in a buffer entry in association with a cache array in response to a first store instruction that hits the cache array before the first store instruction is committed. Further, when a dependent load instruction is subsequent to the first store instruction, the method can include providing the data from the buffer entry in response to the first dependent load instruction. When a second store instruction overlaps an address of the first store instruction, the method can include coalescing data of the second store instruction in the buffer entry before the second store instruction is committed. When the second store instruction is followed by a second dependent load instruction, the method can include providing the coalesced data from the buffer entry in response to the second dependent load instruction.
摘要:
A processor that executes instructions out of program order is described. In some implementations, a processor detects whether a second memory operation is dependent on a first memory operation prior to memory address calculation. If the processor detects that the second memory operation is not dependent on the first memory operation, the processor is configured to allow the second memory operation to be scheduled. If the processor detects that the second memory operation is dependent on the first memory operation, the processor is configured to prevent the second memory operation from being scheduled until the first memory operation has been scheduled to reduce the likelihood of having to reexecute the second memory operation.
摘要:
A programmable cache and cache access protocol that can be dynamically optimized with respect to either power consumption or performance based on a monitored performance of the cache. A monitoring unit monitors cache misses, load use penalty, and/or other performance parameter, and compares the monitored values against a set of one or more predetermined thresholds. Based on the comparison results, a cache controller configures the programmable cache to operate in a parallel mode, to increase cache performance at the cost of greater power consumption, or in a serial mode, to conserve power at the cost of unnecessary performance. A banked cache memory that supports aligned and unaligned instruction fetches using a banked access strategy, and a cache access controller that includes a prefetch capability are also described.
摘要:
A processor includes a first level of cache memory and a first set of instructions configured to implement a first cache coherency protocol. The processor also includes a second set of instructions configured to implement a second cache coherency protocol and a cache coherency protocol selector having at least two choice-states. The processor further includes a cache coherency implementer configured to implement the first cache coherency protocol or the second cache coherency with respect to the first level of cache memory based on a selected choice-state of the cache coherency protocol selector.
摘要:
Devices, systems, methods, and other embodiments associated with a cache memory are described. In one embodiment, a cache tag array includes tag banks. The cache memory further includes a bank selector configured to receive an address and to apply a hash function that maps the address to one of the tag banks.
摘要:
Set-associative caches having corresponding methods and computer programs comprise: a data cache to provide a plurality of cache lines based on a set index of a virtual address, wherein each of the cache lines corresponds to one of a plurality of ways of the set-associative cache; a translation lookaside buffer to provide one of a plurality of way selections based on the set index of the virtual address and a virtual tag of the virtual address, wherein each of the way selections corresponds to one of the ways of the set-associative cache; and a way multiplexer to select one of the cache lines provided by the data cache based on the one of the plurality of way selections.
摘要:
A method and apparatus for power optimized replay. In one embodiment, the method includes the issuance of an instruction selected from a queue. Once issued, the instruction may be enqueued within a recirculation queue if completion of the instruction is blocked by a detected blocking condition. Once enqueued, instructions contained within the recirculation queue may be reissued once a blocking condition of an instruction within the recirculation queue is satisfied. Accordingly, a power optimized replay scheme as described herein optimizes power while retaining the advantages provided by selectively replaying of blocked instructions to improve power efficiency.
摘要:
A method and apparatus for supporting heterogeneous agents in on-chip busses. In one embodiment, the method includes the detection of a bus arbitration event between at least a first bus agent and a second bus agent. In one embodiment, a bus arbitration event is detected when at least the first bus agent and the second bus agent assert their respective bus request signals in a single clock cycle. Once a bus arbitration event is detected, bus ownership may be granted to both the first bus agent and the second bus agent, when the first bus agent and the second bus agent have different grant-to-valid latencies. In the embodiment, heterogeneous bus agents may coexist on a bus without requiring wasted or unused bus cycles following establishment of bus ownership. Other embodiments are described and claimed.