Abstract:
Aspects include computing devices and methods implemented by the computing for automatic cache coherency for page table data on a computing device. Some aspects may include modifying, by a first processing device, page table data stored in a first cache associated with the first processing device, receiving, at a page table coherency unit, a page table cache invalidate signal from the first processing device, issuing, by the page table coherency unit, a cache maintenance operation command to the first processing device, and writing, by the first processing device, the modified page table data stored in the first cache to a shared memory accessible by the first processing device and a second processing device associated with a second cache storing the page table data.
Abstract:
Various aspects are described herein. In some aspects, the present disclosure provides a method of communicating data between an electronic unit of a system-on-chip (SoC) and a dynamic random access memory (DRAM). The method includes initiating a memory transaction corresponding to first data. The method includes determining a non-unique first signature and a unique second signature associated with the first data based on content of the first data. The method includes determining if the non-unique first signature is stored in at least one of a local buffer on the SoC separate from the DRAM or the DRAM. The method includes determining if the unique second signature is stored in at least one of the local buffer or the DRAM based on determining the non-unique first signature is stored. The method includes eliminating the memory transaction with respect to the DRAM based on determining the unique second signature is stored.
Abstract:
A method of interleaving a memory by mapping address bits of the memory to a number N of memory channels iteratively in successive rounds, wherein in each round except the last round: selecting a unique subset of address bits, determining a maximum number (L) of unique combinations possible based on the selected subset of address bits, mapping combinations to the N memory channels a maximum number of times (F) possible where each of the N memory channels gets mapped to an equal number of combinations, and if and when a number of combinations remain (K, which is less than N) that cannot be mapped, one to each of the N memory channels, entering a next round. In the last round, mapping remaining most significant address bits, not used in the subsets in prior rounds, to each of the N memory channels.
Abstract:
A processor system, comprising: a first memory comprising a memory protection table including memory access permission information associated with a set of one or more worlds; and a processor comprising: an execution core configured to run a first world; and a table-based memory protection (TMP) configured to: receive a first request to access memory content at a first target address from the first world; access the memory access permission information from the memory protection table based on the first target address; and determine whether the first world is allowed to access the memory content at the first target address based on the accessed memory access permission information.
Abstract:
An apparatus, including: a memory; a matrix multiplier engine, comprising: an array of multiplier-accumulate units (MAUs) comprising: a first set of accumulators; and a second set of accumulators; and a controller configured to concurrently: cause a first set of resultant values in the first set of accumulators to be transferred to the memory pursuant to a first set of store instructions, wherein the first set of resultant values was generated pursuant to a first set of multiply-accumulate (MAC) operations performed by the set of multipliers and the first set of accumulators; and cause the set of multipliers and the second set of accumulators to perform a second set of MAC operations.
Abstract:
The energy consumed by data transfer in a computing device may be reduced by transferring data that has been encoded in a manner that reduces the number of one “1” data values, the number of signal level transitions, or both. A data destination component of the computing device may receive data encoded in such a manner from a data source component of the computing device over a data communication interconnect, such as an off-chip interconnect. The data may be encoded using minimum Hamming weight encoding, which reduces the number of one “1” data values. The received data may be decoded using minimum Hamming weight decoding. For other computing devices, the data may be encoded using maximum Hamming weight encoding, which increases the number of one “1” data values while reducing the number of zero “0” values, if reducing the number of zero values reduces energy consumption.
Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
Various embodiments of methods and systems for cache-level memory management in a system on a chip (“SoC”) are disclosed. Memory utilization is optimized in certain embodiments through application of customized hashing algorithms at the lower level cache of individual application clients. Advantageously, for those application clients that do not require or benefit from hashing transaction traffic their transactions are not subjected to hashing. For those application clients that do benefit from hashing transaction traffic in order to minimize page conflicts at a double data rate (“DDR”) memory device, each client further benefits from a customized, and thus optimized, hashing algorithm. Because transaction streams arrive at the memory controller already hashed, or purposefully unhashed, the need for validating clients during a development phase is minimized.
Abstract:
Various embodiments of methods and systems for bit flip identification for debugging and/or power management in a system on a chip (“SoC”) are disclosed. Exemplary embodiments seek to identify bit flip occurrences near in time to the occurrences by checking parity values of data blocks as the data blocks are written into a memory component. In this way, bit flips occurring in association with a write transaction may be differentiated from bit flips occurring in association with a read transaction. The distinction may be useful, when taken in conjunction with various parameter levels identified at the time of a bit flip recognition, to debug a memory component or, when in a runtime environment, adjust thermal and power policies that may be contributing to bit flip occurrences.
Abstract:
Methods and systems for pre-fetching address translations in a memory management unit (MMU) of a device are disclosed. In an embodiment, the MMU receives a pre-fetch command from an upstream component of the device, the pre-fetch command including an address of an instruction, pre-fetches a translation of the instruction from a translation table in a memory of the device, and stores the translation of the instruction in a translation cache associated with the MMU.