摘要:
Systems and methods for dynamic repartitioning of physical memory address mapping involve relocating data stored at one or more physical memory locations of one or more memory devices to another memory device or mass storage device, repartitioning one or more corresponding physical memory maps to include new mappings between physical memory addresses and physical memory locations of the one or more memory devices, then loading the relocated data back onto the one or more memory devices at physical memory locations determined by the new physical address mapping. Such dynamic repartitioning of the physical memory address mapping does not require a processing system to be rebooted and has various applications in connection with interleaving reconfiguration and error correcting code (ECC) reconfiguration of the processing system.
摘要:
A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.
摘要:
A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.
摘要:
A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.
摘要:
A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.
摘要:
A method includes emptying a first region of a dynamic random access memory of data by moving data from the first region to a non-volatile memory and reducing a refresh rate of the dynamic random access memory responsive to emptying the first region of data. A system includes a memory controller to refresh a dynamic random access memory based on a configurable refresh rate, the dynamic random access memory having a plurality of regions, each region having an associated minimum refresh rate, and a processing unit to empty a first region of the plurality of regions of the dynamic random access memory by moving data from the first region to a non-volatile memory and to reduce the configurable refresh rate responsive to emptying the first region.
摘要:
The disclosed computer-implemented method can include reaching, by a chiplet involved in carrying out an operation for a process, a synchronization barrier. The method can additionally include receiving, by the chiplet, dedicated control messages pushed to the chiplet by other chiplets involved in carrying out the operation for the process, wherein the dedicated control messages are pushed over a control network by the other chiplets. The method can also include advancing, by the chiplet, the synchronization barrier in response to receipt of the dedicated control messages. Various other methods, systems, and computer-readable media are also disclosed.
摘要:
A system includes a hardware compare and swap (CAS) module communicatively coupled to a bus, the CAS module to perform an atomic operation in response to a first request from a first request agent for the atomic operation to be performed on a data value that is shared among a plurality of request agents and obtain a first result value. The atomic operation includes initiating a CAS command via the bus. The CAS module performs the atomic operation in response to a second request from a second request agent and obtains a second result value. Responsive to determining a failure to successfully process one or more of the first request or the second request, the hardware CAS module repetitively performs the atomic operation, for one or more of the first request or the second request.
摘要:
Systems and methods for dynamic repartitioning of physical memory address mapping involve relocating data stored at one or more physical memory locations of one or more memory devices to another memory device or mass storage device, repartitioning one or more corresponding physical memory maps to include new mappings between physical memory addresses and physical memory locations of the one or more memory devices, then loading the relocated data back onto the one or more memory devices at physical memory locations determined by the new physical address mapping. Such dynamic repartitioning of the physical memory address mapping does not require a processing system to be rebooted and has various applications in connection with interleaving reconfiguration and error correcting code (ECC) reconfiguration of the processing system.
摘要:
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.