Abstract:
Cache coherency rules for a multi-processor computing system that is capable of working with compressed cache lines' worth of information are described. A multi-processor computing system that is capable of working with compressed cache lines' worth of information is also described. The multi-processor computing system includes a plurality of hubs for communicating with various computing system components and for compressing/decompressing cache lines' worth of information. A processor that is capable of labeling cache lines' worth of information in accordance with the cache coherency rules is described. A processor that includes a hub as described above is also described.
Abstract:
A memory controller is described that comprises a compression map cache. The compression map cache is to store information that identifies a cache line's worth of information that has been compressed with another cache line's worth of information. A processor and a memory controller integrated on a same semiconductor die is also described. The memory controller comprises a compression map cache. The compression map cache is to store information that identifies a cache line's worth of information that has been compressed with another cache line's worth of information.
Abstract:
According to one embodiment a computer system is disclosed. The computer system includes a central processing unit (CPU), a cache memory coupled to the CPU and a cache controller coupled to the cache memory. The cache memory includes a plurality of compressible cache lines to store additional data. The cache controller includes compression logic to compress one or more of the plurality of cache lines into compressed cache lines, and hint logic to store hint information in unused space within the compressed cache lines.
Abstract:
Methods and apparatus to optimize the parallel execution of software processes are disclosed. An example method includes receiving a first software process that processes a set of data, locating a first primitive in the first software process, and decomposing the first primitive into a first set of one or more sub-primitives. The example methods and apparatus additionally perform static fusion and dynamic fusion to optimize software processes for execution in parallel processing systems.
Abstract:
Methods and apparatus to optimize the parallel execution of software processes are disclosed. An example method includes receiving a first software process that processes a set of data, locating a first primitive in the first software process, and decomposing the first primitive into a first set of one or more sub-primitives. The example methods and apparatus additionally perform static fusion and dynamic fusion to optimize software processes for execution in parallel processing systems.
Abstract:
According to one embodiment a computer system is disclosed. The computer system includes a central processing unit (CPU), a cache memory coupled to the CPU and a cache controller, coupled to the cache memory. The cache memory includes a plurality of compressible cache lines to store additional data. The cache controller reorders a cache line after each access to the cache line prior to the compression of the cache line into a compressed cache line.
Abstract:
A computer system includes a central processing unit (CPU) and a cache memory coupled to the CPU. The cache memory includes a plurality of compressible cache lines to store additional data.