Abstract:
Techniques described herein generally include methods and systems related to designing and operating a DRAM device that has significantly reduced refresh energy use. A method for designing a DRAM optimizes or otherwise improves the DRAM for energy efficiency based on a measured or predicted failure probability of memory cells in the DRAM. The DRAM may be configured to operate at an increased refresh interval, thereby reducing DRAM refresh energy but causing a predictable portion of the memory cells in the DRAM to leak electrical energy too quickly to retain data. The DRAM is further configured with a selected number of spare memory cells for replacing the “leaky” memory cells, so that operation of the DRAM at the increased refresh interval may result in little or no reduction in capacity of the DRAM.
Abstract:
Methods and systems to assign threads in a multi-core processor are disclosed. A method to assign threads in a multi-core processor may include determining data relating to memory controllers fetching data in response to cache misses experienced by a first core and a second core. Threads may be assigned to cores based on the number of cache misses processed by respective memory controllers. Methods may further include determining that a thread is latency-bound or bandwidth-bound. Threads may be assigned to cores based on the determination of the thread as latency-bound or bandwidth-bound. In response to the assignment of the threads to the cores, data for the thread may be stored in the assigned cores.
Abstract:
Technologies are generally described for methods and systems effective to maintain coherence in a multi-core processor on a die. In an example, a method for processing a request for a particular block in a particular region may include analyzing, by a first processor, a first cache to determine whether there is a block indicator in the first cache associated with the particular block. The method may further include when the first processor determines that the block indicator is not present in the first cache, analyzing, by the first processor, the first cache to determine whether there is a region indicator associated with the particular region. The method may further include when the first processor determines that the region indicator is not present in the first cache, the method further includes sending, by the first processor, the request to the directory in the tile.
Abstract:
Techniques described herein generally include methods and systems related to designing and operating a DRAM device that has significantly reduced refresh energy use. A method for designing a DRAM optimizes or otherwise improves the DRAM for energy efficiency based on a measured or predicted failure probability of memory cells in the DRAM. The DRAM may be configured to operate at an increased refresh interval, thereby reducing DRAM refresh energy but causing a predictable portion of the memory cells in the DRAM to leak electrical energy too quickly to retain data. The DRAM is further configured with a selected number of spare memory cells for replacing the “leaky” memory cells, so that operation of the DRAM at the increased refresh interval may result in little or no reduction in capacity of the DRAM.
Abstract:
Technologies are generally described herein to detect unidirectional resistance drift errors in a multilevel cell of a phase change memory. The resistance levels of the multilevel cell of the phase change memory may be encoded to detect unidirectional resistance drift errors. In some examples, Berger Code-compatible encoding may be used. When a word is written to the multilevel cell, a write check code may be generated. The write check code may be a binary representation of the number of zeroes contained in the word as written. When the word is read from the multilevel cell, a read check code may be generated. The read check code may be a binary representation of the number of zeroes contained in the word as read. An error can be detected if a comparison indicates that the write check code and the read check code are different.
Abstract:
Techniques are generally described for cache management in a processor with a cache. In response to receiving a bulk memory modification instruction, data blocks of the cache associated with the bulk memory modification instruction may be identified. A cache coherence state of the identified data blocks may also be identified. The updated cache coherence state may be indicative of a zero value of the data blocks and the cache coherence state of the identified data blocks may be updated without modification to a cache data array.
Abstract:
Technologies are generally described manage MRAM cache writes in processors. In some examples, when a write request is received with data to be stored in an MRAM cache, the data may be evaluated to determine whether the data is to be further processed. In response to a determination that the data is to be further processed, the data may be stored in a write cache associated with the MRAM cache. In response to a determination that the data is not to be further processed, the data may be stored in the MRAM cache.
Abstract:
Technologies are generally described for a cache coherence directory in multi-processor architectures. In an example, a directory in a die may receive a request for a particular block. The directory may determine a block aging threshold relating to a likelihood that data blocks, including the particular data block, are stored in one or more caches in the die. The directory may further analyze a memory to identify a particular cache indicated as storing the particular data block and identify a number of cache misses for the particular cache. The directory may identify a time when an event occurred for the particular data block and determine whether to send the request for the particular data block to the particular cache based on the aging threshold, the time of the event, and the number of cache misses.
Abstract:
Technologies generally described herein relate to waved time multiplexing. In some examples, a command flit can be transmitted from a sender node of a network-on-chip (“NOC”) to a destination node of the NOC via an intermediate node along a circuit-switched path. The command flit can include an interval period and a release duration. When the command flit has been transmitted, one or more data flits can be transmitted from the sender node to the destination node via the intermediate node along the circuit-switched path. The sender node, the destination node, and the intermediate node can be configured to reserve router resources of the sender node, the destination node, and the intermediate node respectively for circuit-switched traffic during a use duration of the interval period and to release the router resources for packet-switched traffic during the release duration in a waved time multiplex arrangement.
Abstract:
Technologies are generally described for methods, systems and processors effective to migrate a thread. The thread may be migrated from the first core to the second core. The first and the second core may be configured in communication with a first cache. The first core may generate a request for a first data block from the first cache. In response to a cache miss in the first cache for the first data block, the first core may generate a request for the first data block from a memory. The first core may coordinate with a second cache to store the first data block in the second cache. The thread may be migrated from the second core to a third core. The second core and third core may be configured in communication with the second cache.