Abstract:
Aspects include computing devices, apparatus, and methods implemented by the apparatus for implementing asynchronous cache maintenance operations on a computing device, including activating a first asynchronous cache maintenance operation, determining whether an active address of a memory access request to a cache is in a first range of addresses of the first active asynchronous cache maintenance operation, and queuing the first active asynchronous cache maintenance operation as the first asynchronous cache maintenance operation in a fixup queue in response to determining that the active address is in the first range of addresses.
Abstract:
Aspects include computing devices, systems, and methods for implementing selecting an available shared cache memory as a victim cache. The computing device may identify a remote shared cache memory with available shared cache memory space for use as the victim cache. To select the appropriate available shared cache memory, the computing device may retrieve data for the identified remote shared cache memory or a processor cluster associated with the identified remote shared cache memory relating to a metric, such as performance speed, efficiency, or effective victim cache size. Using the retrieved data, the computing device may determine the identified remote shared cache memory to use as the victim cache and select the determined remote shared cache memory to use as the victim cache.
Abstract:
Aspects include computing devices, systems, and methods for implementing selecting an available shared cache memory as a victim cache. The computing device may identify a remote shared cache memory with available shared cache memory space for use as the victim cache. To select the appropriate available shared cache memory, the computing device may retrieve data for the identified remote shared cache memory or a processor cluster associated with the identified remote shared cache memory relating to a metric, such as performance speed, efficiency, or effective victim cache size. Using the retrieved data, the computing device may determine the identified remote shared cache memory to use as the victim cache and select the determined remote shared cache memory to use as the victim cache.
Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
An intelligent tile-based prefetching solution executed by a compression address aperture services linearly addressed data requests from a processor to memory stored in a memory component having a tile-based address structure. The aperture monitors tile reads and seeks to match the tile read pattern to a predefined pattern. If a match is determined, the aperture executes a prefetching algorithm uniquely and optimally associated with the predefined tile read pattern. In this way, tile overfetch is mitigated while the latency on first line data reads is reduced.
Abstract:
Methods, devices, and non-transitory process-readable storage media for compacting data within cache lines of a cache. An aspect method may include identifying, by a processor of the computing device, a base address (e.g., a physical or virtual cache address) for a first data segment, identifying a data size (e.g., based on a compression ratio) for the first data segment, obtaining a base offset based on the identified data size and the base address of the first data segment, and calculating an offset address by offsetting the base address with the obtained base offset, wherein the calculated offset address is associated with a second data segment. In some aspects, the method may include identifying a parity value for the first data segment based on the base address and obtaining the base offset by performing a lookup on a stored table using the identified data size and identified parity value.
Abstract:
Aspects include computing devices, apparatus, and methods implemented by the apparatus for implementing a hybrid input/output (I/O) coherent write request on a computing device, including receiving an I/O coherent write request, generating a first hybrid I/O coherent write request and a second hybrid I/O coherent write request from the I/O coherent write request, sending the first hybrid I/O coherent write request and I/O coherent write data of the I/O coherent write request to a shared memory, and sending the second hybrid I/O coherent write request without the I/O coherent write data of the I/O coherent write request to a coherency domain.
Abstract:
Aspects include computing devices, apparatus, and methods implemented by the apparatus for implementing dynamic input/output (I/O) coherent workload processing on a computing device. Aspect methods may include offloading, by a processing device, a workload to a hardware accelerator for execution using an I/O coherent mode, detecting a dynamic trigger for switching from the I/O coherent mode to a non-I/O coherent mode while the workload is executed by the hardware accelerator, and switching from the I/O coherent mode to a non-I/O coherent mode while the workload is executed by the hardware accelerator.
Abstract:
Aspects include computing devices, systems, and methods for implementing a cache memory access requests for data smaller than a cache line and eliminating overfetching from a main memory by combining the data with padding data of a size of a difference between a size of a cache line and the data. A processor may determine whether the data, uncompressed or compressed, is smaller than a cache line using a size of the data or a compression ratio of the data. The processor may generate the padding data using constant data values or a pattern of data values. The processor may send a write cache memory access request for the combined data to a cache memory controller, which may write the combined data to a cache memory. The cache memory controller may send a write memory access request to a memory controller, which may write the combined data to a memory.
Abstract:
Various embodiments of methods and systems for managing write transaction volume from a master component to a long term memory component in a system on a chip (“SoC”) are disclosed. Because power consumption and bus bandwidth are unnecessarily consumed when ephemeral data is written back to long term memory (such as a double data rate “DDR” memory) from a closely coupled memory component (such as a low level cache “LLC” memory) of a data generating master component, embodiments of the solutions seek to identify write transactions that contain ephemeral data and prevent the ephemeral data from being written to DDR.