摘要:
Methods and systems for storing and accessing training example data for a machine learning procedure. The systems and methods described pre-process data to store it in a non-transient memory in a random order. During training, a set of the data is retrieved and stored in a random access memory. One or more subsets of the data may then be retrieved from the random access memory and used to train a machine learning model.
摘要:
A method and apparatus for transmitting data includes determining whether to apply a mask to a cache line that includes a first type of data and a second type of data for transmission based upon a first criteria. The second type of data is filtered from the cache line, and the first type of data along with an identifier of the applied mask is transmitted. The first type of data and the identifier is received, and the second type of data is combined with the first type of data to recreate the cache line based upon the received identifier.
摘要:
Providing memory bandwidth compression using adaptive compression in central processing unit (CPU)-based systems is disclosed. In one aspect, a compressed memory controller (CMC) is configured to implement two compression mechanisms: a first compression mechanism for compressing small amounts of data (e.g., a single memory line), and a second compression mechanism for compressing large amounts of data (e.g., multiple associated memory lines). When performing a memory write operation using write data that includes multiple associated memory lines, the CMC compresses each of the memory lines separately using the first compression mechanism, and also compresses the memory lines together using the second compression mechanism. If the result of the second compression is smaller than the result of the first compression, the CMC stores the second compression result in the system memory. Otherwise, the first compression result is stored.
摘要:
Examples include techniques for a write commands to one or more storage devices coupled with a host computing platform. In some examples, the write commands may be responsive to write requests from applications hosted or supported by the host computing platform. A tracking table is utilized by elements of the host computing platform and the one or more storage devices such that the write commands are completed by the one or more storage devices without a need for an interrupt response to elements of the host computing platform.
摘要:
Some aspects of the disclosure relate to a pre-fetch mechanism for a cache line compression system that increases RAM capacity and optimizes overflow area reads. For example, a pre-fetch mechanism may allow the memory controller to pipeline the reads from an area with fixed size slots (main compressed area) and the reads from an overflow area. The overflow area is arranged so that a cache line most likely containing the overflow data for a particular line may be calculated by a decompression engine. In this manner, the cache line decompression engine may fetch, in advance, the overflow area before finding the actual location of the overflow data.
摘要:
A multiprocessor data processing system includes multiple vertical cache hierarchies supporting a plurality of processor cores, a system memory, and a system interconnect. In response to a load-and-reserve request from a first processor core, a first cache memory supporting the first processor core issues on the system interconnect a memory access request for a target cache line of the load-and-reserve request. Responsive to the memory access request and prior to receiving a system wide coherence response for the memory access request, the first cache memory receives from a second cache memory in a second vertical cache hierarchy by cache-to-cache intervention the target cache line and an early indication of the system wide coherence response for the memory access request. In response to the early indication and prior to receiving the system wide coherence response, the first cache memory initiating processing to update the target cache line in the first cache memory.
摘要:
An apparatus includes processing circuitry to process instructions, some of which may require addresses to be translated. The apparatus also includes address translation circuitry to translate addresses in response to instruction processed by the processing circuitry. Furthermore, the apparatus also includes translation latency measuring circuitry to measure a latency of at least part of an address translation process performed by the address translation circuitry in response to a given instruction.
摘要:
Systems, apparatuses and methods may provide for detecting an issued request in a queue that is shared by a plurality of domains in a memory architecture, wherein the plurality of domains are associated with non-uniform access latencies. Additionally, a destination domain associated with the issued request may be determined. Moreover, a first set of additional requests may be prevented from being issued to the queue if the issued request satisfies an overrepresentation condition with respect to the destination domain and the first set of additional requests are associated with the destination domain. In one example, a second set of additional requests are permitted to be issued to the queue while the first set of additional requests are prevented from being issued to the queue, wherein the second set of additional requests are associated with one or more remaining domains in the plurality of domains.
摘要:
Systems, methods, and computer programs are disclosed for scheduling memory transactions. An embodiment of a method comprises determining future memory state data of a dynamic random access memory (DRAM) for a predetermined number of future clock cycles. The DRAM is electrically coupled to a system on chip (SoC). Based on the future memory state data, one of a plurality of pending memory transactions is selected that speculatively optimizes DRAM efficiency. The selected memory transaction is sent to a shared cache controller. If the selected memory transaction results in a cache miss, the selected memory transaction is sent to a DRAM controller.
摘要:
In an example, an apparatus is described that includes a memory array. The memory array includes a volatile memory, a first non-volatile memory, and a second non-volatile memory. The memory array further includes a cache manager that controls access by a computer system to the memory array. For instance, the cache manager may carry out memory operations, including read operations, write operations, and cache evictions, in conjunction with at least one of the volatile memory, the first non-volatile memory, or the second non-volatile memory.