Abstract:
Aspects include computing devices, systems, and methods for dynamically partitioning a system cache by sets and ways into component caches. A system cache memory controller may manage the component caches and manage access to the component caches. The system cache memory controller may receive system cache access requests and reserve locations in the system cache corresponding to the component caches correlated with component cache identifiers of the requests. Reserving locations in the system cache may activate the locations in the system cache for use by a requesting client, and may also prevent other client from using the reserved locations in the system cache. Releasing the locations in the system cache may deactivate the locations in the system cache and allow other clients to use them. A client reserving locations in the system cache may change the amount of locations it has reserved within its component cache.
Abstract:
This disclosure provides systems, methods, and devices for image signal processing that support efficient cache usage in an image processing system. In a first aspect, a method of image processing includes storing, by an image processor, a first portion of a first stripe of a first frame in a cache, the first portion of the first stripe overlapping a second portion of a second stripe of the first frame, reading, by the image processor, a third portion of the second stripe from a memory, reading, by the image processor, the first portion of the first stripe from the cache, and processing, by the image processor, the second stripe using the first portion of the first stripe and the third portion of the second stripe. Other aspects and features are also claimed and described.
Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
An intelligent tile-based prefetching solution executed by a compression address aperture services linearly addressed data requests from a processor to memory stored in a memory component having a tile-based address structure. The aperture monitors tile reads and seeks to match the tile read pattern to a predefined pattern. If a match is determined, the aperture executes a prefetching algorithm uniquely and optimally associated with the predefined tile read pattern. In this way, tile overfetch is mitigated while the latency on first line data reads is reduced.
Abstract:
Methods, devices, and non-transitory process-readable storage media for compacting data within cache lines of a cache. An aspect method may include identifying, by a processor of the computing device, a base address (e.g., a physical or virtual cache address) for a first data segment, identifying a data size (e.g., based on a compression ratio) for the first data segment, obtaining a base offset based on the identified data size and the base address of the first data segment, and calculating an offset address by offsetting the base address with the obtained base offset, wherein the calculated offset address is associated with a second data segment. In some aspects, the method may include identifying a parity value for the first data segment based on the base address and obtaining the base offset by performing a lookup on a stored table using the identified data size and identified parity value.
Abstract:
Aspects include computing devices, systems, and methods for implementing a cache memory access requests for data smaller than a cache line and eliminating overfetching from a main memory by combining the data with padding data of a size of a difference between a size of a cache line and the data. A processor may determine whether the data, uncompressed or compressed, is smaller than a cache line using a size of the data or a compression ratio of the data. The processor may generate the padding data using constant data values or a pattern of data values. The processor may send a write cache memory access request for the combined data to a cache memory controller, which may write the combined data to a cache memory. The cache memory controller may send a write memory access request to a memory controller, which may write the combined data to a memory.
Abstract:
Various embodiments of methods and systems for managing write transaction volume from a master component to a long term memory component in a system on a chip (“SoC”) are disclosed. Because power consumption and bus bandwidth are unnecessarily consumed when ephemeral data is written back to long term memory (such as a double data rate “DDR” memory) from a closely coupled memory component (such as a low level cache “LLC” memory) of a data generating master component, embodiments of the solutions seek to identify write transactions that contain ephemeral data and prevent the ephemeral data from being written to DDR.
Abstract:
Aspects include computing devices, systems, and methods for implementing scheduling an execution process to an execution processor cluster to take advantage of reduced latency with a victim cache. The computing device may determine a first processor cluster with a first remote shared cache memory having an available shared cache memory space. To properly schedule the execution process, the computing device may determine a second processor cluster with a lower latency to the first remote shared cache memory than an execution processor cluster scheduled with the execution process. The second processor cluster may be scheduled the execution process, thus becoming the execution processor cluster, based on a size of the available shared cache memory space and the latency of the second processor cluster to the first remote shared cache memory. The available shared cache memory space may be used as the victim cache for the execution process.
Abstract:
Methods, devices, and non-transitory process-readable storage media for compacting data within cache lines of a cache. An aspect method may include identifying, by a processor of the computing device, a base address (e.g., a physical or virtual cache address) for a first data segment, identifying a data size (e.g., based on a compression ratio) for the first data segment, obtaining a base offset based on the identified data size and the base address of the first data segment, and calculating an offset address by offsetting the base address with the obtained base offset, wherein the calculated offset address is associated with a second data segment. In some aspects, the method may include identifying a parity value for the first data segment based on the base address and obtaining the base offset by performing a lookup on a stored table using the identified data size and identified parity value.
Abstract:
A method includes generating a plurality of future reference picture lists associated with a plurality of future pictures of a set of pictures, wherein the set of pictures includes a current picture and the plurality of future pictures, and the plurality of future pictures follow the current picture in coding order, determining, based on information derived from the plurality of future reference picture lists associated with the plurality of future pictures, whether to write the current picture in a dedicated chip memory or whether to write the current picture in a non-dedicated system memory, and writing the current picture in the dedicated chip memory or the non-dedicated system memory based on the determination of whether to write the current picture in the dedicated chip memory or whether to write the current picture the non-dedicated system memory.