摘要:
In one embodiment, the present invention includes a cache memory, which may be a sequential cache, having multiple banks. Each of the banks includes a data array, a decoder coupled to the data array to select a set of the data array, and a sense amplifier. Only a bank to be accessed may be powered, and in some embodiments early way information may be used to maintain remaining banks in a power reduced state. In some embodiments, clock gating may be used to maintain various components of the cache memory in a power reduced state. Other embodiments are described and claimed.
摘要:
A high-speed memory management technique that minimizes clobber in sequentially accessed memory, including but not limited to, for example, a trace cache. The method includes selecting a victim set from a sequentially accessed memory; selecting a victim way for the selected victim set; reading a next way pointer from a trace line of a trace currently stored in the selected victim way, if the selected victim way has the next way pointer; and writing a next line of the new trace into the selected victim way over the trace line of the currently stored trace. The method also includes forcing a replacement algorithm of next set to select a victim way of the next set using the next way pointer, if the trace line of the currently stored trace is not an active trace tail line.
摘要:
The number of ways in an N-way set associative sequential cache is modulated to trade power and performance. Way selection is restricted during the allocation based on address so that only a subset of the N-ways is used for a range of addresses allowing the N-ways that are not in use to be powered off.
摘要:
In one embodiment, the present invention includes a cache memory, which may be a sequential cache, having multiple banks. Each of the banks includes a data array, a decoder coupled to the data array to select a set of the data array, and a sense amplifier. Only a bank to be accessed may be powered, and in some embodiments early way information may be used to maintain remaining banks in a power reduced state. In some embodiments, clock gating may be used to maintain various components of the cache memory in a power reduced state. Other embodiments are described and claimed.
摘要:
The number of ways in an N-way set associative sequential cache is modulated to trade power and performance. Way selection is restricted during the allocation based on address so that only a subset of the N-ways is used for a range of addresses allowing the N-ways that are not in use to be powered off.
摘要:
In a processor cache, cache circuits are mapped into one or more logical modules. Each module may be powered down independently of other modules in response to microinstructions processed by the cache. Power control may be applied on a microinstruction-by-microinstruction basis. Because the microinstructions determine which modules are used, power savings may be achieved by powering down those modules that are not used. A cache layout organization may be modified to distribute a limited number of ways across addressable cache banks. By associating fewer than a total number of ways to a bank (for example, one or two ways), the size of memory clusters within the bank may be reduced. The reduction in this size of the memory cluster contributes reduces the power needed for an address decoder to address sets within the bank.
摘要:
A method and apparatus for a loop predictor for predicting the end of a loop is disclosed. In one embodiment, the loop predictor may have a predict counter to hold a predict count representing the expected number of times that a predictor stew value will repeat during the execution of a given loop. The loop predictor may also have one or more running counters to hold a count of the times that the stew value has repeated during the execution of the present loop. When the counter values match the predictor may issue a prediction that the loop will end.
摘要:
A trace management architecture to enable the reuse of uops within one or more repeated traces. More particularly, embodiments of the invention relate to a technique to prevent multiple accesses to various functional units within a trace management architecture by reusing traces or sequences of traces that are repeated during a period of operation of the microprocessor, avoiding performance gaps due to multiple trace cache accesses and increasing the rate at which uops can be executed within a processor.
摘要:
A method and apparatus for a trace end predictor for a trace cache is disclosed. In one embodiment, the trace end predictor may have one or more buffers to contain a head address for a subsequent trace. The head address may include the way number and set number of the next head, along with partial stew data to support additional execution predictors. The buffers may also include tag data of the current trace's tail address, and may additionally include control bits for determining whether to replace the buffer's contents with information from another trace's tail. Reading the next head address from the trace end predictor, as opposed to reading it from the trace cache array, may reduce certain execution time delays.
摘要:
Systems and methods may provide a graphics processor that may identify operating conditions under which certain floating point instructions may utilize power to fewer hardware resources compared to when the instructions are executing under other operating conditions. The operating conditions may be determined by examining operands used in a given instruction, including the relative magnitudes of the operands and whether the operands may be taken as equal to certain defined values. The floating point instructions may include instructions for an addition operation, a multiplication operation, a compare operation, and/or a fused multiply-add operation.