Abstract:
One embodiment of the present invention includes techniques for a first processing unit to perform an atomic operation on a memory page shared with a second processing unit. The memory page is associated with a page table entry corresponding to the first processing unit. Before executing the atomic operation, an MMU included in the first processing unit evaluates an atomic permission bit that is included in the page table entry. If the MMU determines that the atomic permission bit is inactive, then the two processing units coordinate to change the permission status of the memory page. As part of the status change, the atomic permission bit in the page table entry is activated. Subsequently, the first processing unit performs the atomic operation uninterrupted by the second processing unit. Advantageously, coordinating the processing unit via the atomic permission bit ensures the proper and efficient execution of the atomic operation.
Abstract:
One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.
Abstract:
One embodiment of the present invention sets forth a computer-implemented method for migrating a memory page from a first memory to a second memory. The method includes determining a first page size supported by the first memory. The method also includes determining a second page size supported by the second memory. The method further includes determining a use history of the memory page based on an entry in a page state directory associated with the memory page. The method also includes migrating the memory page between the first memory and the second memory based on the first page size, the second page size, and the use history.