Abstract:
A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode.
Abstract:
An integrated circuit (IC) package includes a stacked-die memory device. The stacked-die memory device includes a set of one or more stacked memory dies implementing memory cell circuitry. The stacked-die memory device further includes a set of one or more logic dies electrically coupled to the memory cell circuitry. The set of one or more logic dies includes a query controller and a memory controller. The memory controller is coupleable to at least one device external to the stacked-die memory device. The query controller is to perform a query operation on data stored in the memory cell circuitry responsive to a query command received from the external device.
Abstract:
A system includes a first memory and a device coupleable to the first memory. The device includes a second memory to cache data from the first memory. The second memory includes a plurality of rows, each row including a corresponding set of compressed data blocks of non-uniform sizes and a corresponding set of tag blocks. Each tag block represents a corresponding compressed data block of the row. The device further includes decompression logic to decompress data blocks accessed from the second memory. The device further includes compression logic to compress data blocks to be stored in the second memory.
Abstract:
An integrated circuit (IC) package includes a stacked-die memory device. The stacked-die memory device includes a set of one or more stacked memory dies implementing memory cell circuitry. The stacked-die memory device further includes a set of one or more logic dies electrically coupled to the memory cell circuitry. The set of one or more logic dies includes a query controller and a memory controller. The memory controller is coupleable to at least one device external to the stacked-die memory device. The query controller is to perform a query operation on data stored in the memory cell circuitry responsive to a query command received from the external device.
Abstract:
Methods, media, and computing systems are provided. The method includes, the media are configured for, and the computing system includes a processor with control logic for allocating memory for storing a plurality of local register states for work items to be executed in single instruction multiple data hardware and for repacking wavefronts that include work items associated with a program instruction responsive to a conditional statement. The repacking is configured to create repacked wavefronts that include at least one of a wavefront containing work items that all pass the conditional statement and a wavefront containing work items that all fail the conditional statement.
Abstract:
Methods, and media, and computer systems are provided. The method includes, the media includes control logic for, and the computer system includes a processor with control logic for overriding an execution mask of SIMD hardware to enable at least one of a plurality of lanes of the SIMD hardware. Overriding the execution mask is responsive to a data parallel computation and a diverged control flow of a workgroup.
Abstract:
A method of partitioning a data cache comprising a plurality of sets, the plurality of sets comprising a plurality of ways, is provided. Responsive to a stack data request, the method stores a cache line associated with the stack data in one of a plurality of designated ways of the data cache, wherein the plurality of designated ways is configured to store all requested stack data.
Abstract:
A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.
Abstract:
A data processor includes a register file divided into at least a first portion and a second portion for storing data. A single instruction, multiple data (SIMD) unit is also divided into at least a first lane and a second lane. The first and second lanes of the SIMD unit correspond respectively to the first and second portions of the register file. Furthermore, each lane of the SIMD unit is capable of data processing. The data processor also includes a realignment element in communication with the register file and the SIMD unit. The realignment element is configured to selectively realign conveyance of data between the first portion of the register file and the first lane of the SIMD unit to the second lane of the SIMD unit.
Abstract:
Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various data movement and address calculation operations. This functionality would allow compound memory operations—a single request communicated to the memory that characterizes the accesses and movement of many data items. This eliminates the performance and power overheads associated with communicating address and control information on a fine-grain, per-data-item basis from a host processor (or other device) to the memory. This approach also provides better visibility of macro-level memory access patterns to the memory system and may enable additional optimizations in scheduling memory accesses.