Abstract:
A computing system to compress an array using hardware-based compression and to perform various instructions on the compressed array is generally described. The computing system may receive an instruction adapted to access an address in an array. The computing system may determine whether the address is compressible. If the address is compressible, then the computing system may determine a compressed address of a compressed array based on the address. The compressed array may represent a compressed layout of the array, where a reduced size of each compressed element in the compressed array is smaller than an original size of each element in the array. The computing system may access the compressed array at the compressed address in accordance with the instruction.
Abstract:
Technologies are generally described for methods and systems effective to implement a memory allocation accelerator. A processor may generate a request for allocation of a requested chunk of memory. The request may be received by a memory allocation accelerator configured to be in communication with the processor. The memory allocation accelerator may process the request to identify an address for a particular chunk of memory corresponding to the request and may return the address to the processor.
Abstract:
A cache coherence mechanism may comprise a bit-to-cache map for processor cores operable up to a maximum frequency for cores of a multicore processor. Entries in a cache coherence directory may include a bit field identifying cores operable at or near the maximum frequency that share a memory block corresponding to the entry. An additional field may indicate sharing by cores operating at lower frequencies. The additional field may be indicative of the bit-field corresponding to a bit-to-cache map representative of cores other than those operating at or near the maximum frequency.
Abstract:
A processor may comprise a plurality of cores operating at heterogeneous frequencies communicatively coupled by a network of routers also operating at heterogeneous frequencies. A core may be prioritized for thread execution based on operating frequencies of routers on a path from the core to a memory controller. Relatively higher priority may be assigned to cores having a path comprising only routers operating at a relatively higher frequency. A combined priority for thread execution may be based on core frequency, router frequency, and the frequency of routers on a path from the core to a memory controller. A core may be selected based primarily on core operating frequency when cache misses fall below a threshold value.
Abstract:
Technologies are generally described herein to detect unidirectional resistance drift errors in a multilevel cell of a phase change memory. The resistance levels of the multilevel cell of the phase change memory may be encoded to detect unidirectional resistance drift errors. In some examples, Berger Code-compatible encoding may be used. When a word is written to the multilevel cell, a write check code may be generated. The write check code may be a binary representation of the number of zeroes contained in the word as written. When the word is read from the multilevel cell, a read check code may be generated. The read check code may be a binary representation of the number of zeroes contained in the word as read. An error can be detected if a comparison indicates that the write check code and the read check code are different.
Abstract:
Technologies are generally described for systems, devices and methods effective to dynamically select at least one power supply rail for a router. In some examples, a power control unit may be configured to determine a buffer occupancy level of one or more buffers of the router. In some further examples, the buffer occupancy level may be compared to a threshold value. In various other examples, the at least one power supply rail of the router may be switched from a first power rail to a second power rail based on the results of the comparison.
Abstract:
Techniques described herein generally include methods and systems related to designing and operating a DRAM device that has significantly reduced refresh energy use. A method for designing a DRAM optimizes or otherwise improves the DRAM for energy efficiency based on a measured or predicted failure probability of memory cells in the DRAM. The DRAM may be configured to operate at an increased refresh interval, thereby reducing DRAM refresh energy but causing a predictable portion of the memory cells in the DRAM to leak electrical energy too quickly to retain data. The DRAM is further configured with a selected a number of spare memory cells for replacing the “leaky” memory cells, so that operation of the DRAM at the increased refresh interval may result in little or no reduction in capacity of the DRAM.
Abstract:
Technologies are generally described for methods, systems, and devices effective to implement one-cacheable multi-core architectures. In one example, a multi-core processor that includes a first and second tile may be configured to implement a one-cacheable architecture. The second tile may be configured to generate a request for a data block. The first tile may be configured to receive the request for the data block, and determine that the requested data block is part of a group of data blocks identified as one-cacheable. The first tile may further determine that the requested data block is stored in a first cache in the first tile. The first tile may send the data block from the first cache in the first tile to the second tile, and invalidate the data blocks of the group of data blocks in the first cache in the first tile.
Abstract:
Technologies are described herein generally relate to aggregation of cache eviction notifications to a directory. Some example technologies may be utilized to update an aggregation table to reflect evictions of a plurality of blocks from a plurality of block addresses of at least one cache memory. An aggregate message can be generated, where the message specifies the evictions of the plurality of blocks as reflected in the aggregation table. The aggregate message can be sent to the directory. The directory can parse the aggregate message and update a plurality of directory entries to reflect the evictions from the cache memory as specified in the aggregate message.
Abstract:
Techniques described herein generally include methods and systems related to cache partitioning in a chip multiprocessor. Cache-partitioning for a single thread or application between multiple data sources improves energy or latency efficiency of a chip multiprocessor by exploiting variations in energy cost and latency cost of the multiple data sources. Partition sizes for each data source may be selected using an optimization algorithm that minimizes or otherwise reduces latencies or energy consumption associated with cache misses.