Abstract:
A technique to increase memory bandwidth for throughput applications. In one embodiment, memory bandwidth can be increased, particularly for throughput applications, without increasing interconnect trace or pin count by pipelining pages between one or more memory storage areas on half cycles of a memory access clock.
Abstract:
A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.
Abstract:
A method and apparatus for efficiently caching streaming and non-streaming data is described herein. Software, such as a compiler, identifies last use streaming instructions/operations that are the last instruction/operation to access streaming data for a number of instructions or an amount of time. As a result of performing an access to a cache line for a last use instruction/operation, the cache line is updated to a streaming data no longer needed (SDN) state. When control logic is to determine a cache line to be replaced, a modified Least Recently Used (LRU) algorithm is biased to select SDN state lines first to replace no longer needed streaming data.
Abstract:
A system and method for generating a single compressed vector including two or more predetermined attribute values. For each of a plurality of data points such as pixels, if a first and a second attribute values of the data point are equal to a first and a second, respectively, of the two or more predetermined attribute values, the compressed vector is used to operate on the data point. Other embodiments are described and claimed.
Abstract:
A method and apparatus to detect and filter out redundant cache line addresses in a prefetch input queue, and to adjust the detector window size dynamically according to the number of detector entries in the queue for the cache-to-memory controller bus. Detectors correspond to cache line addresses that may represent cache misses in various levels of cache memory.
Abstract:
A system and method for a processor to determine a memory page management implementation used by a memory controller without necessarily having direct access to the circuits or registers of the memory controller is disclosed. In one embodiment, a matrix of counters correspond to potential page management implementations and numbers of pages per block. The counters may be incremented or decremented depending upon whether the corresponding page management implementations and numbers of pages predict a page boundary whenever a long access latency is observed. The counter with the largest value after a period of time may correspond to the actual page management implementation and number of pages per block.
Abstract:
Techniques to control power and processing among a plurality of asymmetric processing elements are disclosed. In one embodiment, one or more asymmetric processing elements are power managed to migrate processes or threads among a plurality of processing elements according to the performance and power needs of the system.
Abstract:
A technique to share execution resources. In one embodiment, a CPU and a GPU share resources according to workload, power considerations, or available resources by scheduling or transferring instructions and information between the CPU and GPU.
Abstract:
A technique to share execution resources. In one embodiment, a CPU and a GPU share resources according to workload, power considerations, or available resources by scheduling or transferring instructions and information between the CPU and GPU.
Abstract:
Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system.