Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
Methods and systems are disclosed for full-hardware management of power and clock domains related to a distributed virtual memory (DVM) network. An aspect includes transmitting, from a DVM initiator to a DVM network, a DVM operation, broadcasting, by the DVM network to a plurality of DVM targets, the DVM operation, and, based on the DVM operation being broadcasted to the plurality of DVM targets by the DVM network, performing one or more hardware optimizations comprising: turning on a clock domain coupled to the DVM network or a DVM target of the plurality of DVM targets that is a target of the DVM operation, increasing a frequency of the clock domain, turning on a power domain coupled to the DVM target based on the power domain being turned off, or terminating the DVM operation to the DVM target based on the DVM target being turned off.
Abstract:
Various embodiments include methods and devices for managing optional commands. Some embodiments may include receiving an optional command from an optional command request device, determining whether the optional command can be implemented, and transmitting, to the optional command request device, an optional command no data response in response to determining that the optional command cannot be implemented.
Abstract:
In some aspects, the present disclosure provides a method for managing a command queue in a universal flash storage (UFS) host device. The method includes determining to power on a first subsystem of a system-on-a-chip (SoC), wherein the determination to power on the first subsystem is made by a second subsystem of the SoC based on detection of user identity data contained in a first image frame during an initial biometric detection process. In certain aspects, the second subsystem is configured to operate independent of the first subsystem and control power to the first subsystem. In certain aspects, the second subsystem includes a second optical sensor, a set of ambient sensors, and a second processor configured to detect, via a set of ambient sensors, an event comprising one or more of an environmental event outside of the device or a motion event of the device.
Abstract:
Certain aspects of the present disclosure provide techniques for improved hardware utilization. An input data tensor is divided into a first plurality of sub-tensors, and a plurality of logical sub-arrays in a physical multiply-and-accumulate (MAC) array is identified. For each respective sub-tensor of the first plurality of sub-tensors, the respective sub-tensor is mapped to a respective logical sub-array of the plurality of logical sub-arrays, and the respective sub-tensor is processed using the respective logical sub-array.
Abstract:
Systems and methods for forecasting behavior of caches include a hypothetical cache. The hypothetical cache is configured to emulate cache behavior, and performance metrics for the hypothetical cache are determined, where the performance metrics may be based on cache hits/misses. Performance metrics for a real cache of a processor core of a processing system may also be similarly determined Behavior of the real cache is forecast based, at least, on performance metrics of the hypothetical cache, and in some cases, also on performance metrics of the real cache (e.g., based on a comparison of the performance metrics). Actions may be recommended and/or performed based on the forecast, where the actions include modifying the real cache size, associativity, or allocation for processor cores, migrating a task running in one processor cluster to another processor cluster, or for collecting data for the real cache for offline analysis.
Abstract:
Systems and methods pertain to a multiprocessor system comprising disunited cache structures. A first private-information cache is coupled to a first processor of the multiprocessor system. The first private-information cache is configured to store information that is private to the first processor. A first shared-information cache which is disunited from the first private-information cache is also coupled to the first processor. The first shared-information cache is configured to store information that is shared/shareable between the first processor and one or more other processors of the multiprocessor system.
Abstract:
Ephemeral data stored in a cache is read when needed but is not written to system memory so as to save power and bandwidth. In an embodiment, a no-writeback bit associated with the ephemeral data is set in response to a read-no-writeback instruction. Data in a cache line for which its no-writeback bit has been set is not written back into system memory. Accordingly, when evicting cache lines, if a cache line has a no-writeback bit set, then the data in that cache line is discarded without being written back to system memory.
Abstract:
Methods and systems are disclosed for full-hardware management of power and clock domains related to a distributed virtual memory (DVM) network. An aspect includes transmitting, from a DVM initiator to a DVM network, a DVM operation, broadcasting, by the DVM network to a plurality of DVM targets, the DVM operation, and, based on the DVM operation being broadcasted to the plurality of DVM targets by the DVM network, performing one or more hardware optimizations comprising: turning on a clock domain coupled to the DVM network or a DVM target of the plurality of DVM targets that is a target of the DVM operation, increasing a frequency of the clock domain, turning on a power domain coupled to the DVM target based on the power domain being turned off, or terminating the DVM operation to the DVM target based on the DVM target being turned off.
Abstract:
Aspects include computing devices, apparatus, and methods implemented by the apparatus for input/output-coherent look-ahead cache access on a computing device. The aspects may include intercepting, at a look-ahead device, a look-ahead request for data in a cache of a first input/output (I/O) device from a second I/O device, determining, by the look-ahead device, whether the data requested by the look-ahead request is stored in the cache, retrieving, by the look-ahead device, the data requested by the look-ahead request from the cache in response to determining that the data requested by the look-ahead request is stored in the cache, marking the data requested by the look-ahead request as invalid in the cache, and storing, by the look-ahead device, the retrieved data to a look-ahead buffer