CPU-to-GPU and GPU-to-GPU atomics
    41.
    发明授权

    公开(公告)号:US09830210B2

    公开(公告)日:2017-11-28

    申请号:US14011671

    申请日:2013-08-27

    CPC classification number: G06F11/073 G06F11/0751 G06T1/20 G06T1/60

    Abstract: One embodiment of the present invention includes techniques for a first processing unit to perform an atomic operation on a memory page shared with a second processing unit. The memory page is associated with a page table entry corresponding to the first processing unit. Before executing the atomic operation, an MMU included in the first processing unit evaluates an atomic permission bit that is included in the page table entry. If the MMU determines that the atomic permission bit is inactive, then the two processing units coordinate to change the permission status of the memory page. As part of the status change, the atomic permission bit in the page table entry is activated. Subsequently, the first processing unit performs the atomic operation uninterrupted by the second processing unit. Advantageously, coordinating the processing unit via the atomic permission bit ensures the proper and efficient execution of the atomic operation.

    Replaying memory transactions while resolving memory access faults
    42.
    发明授权
    Replaying memory transactions while resolving memory access faults 有权
    在解决内存访问故障的同时重新记忆事务

    公开(公告)号:US09575892B2

    公开(公告)日:2017-02-21

    申请号:US14109678

    申请日:2013-12-17

    Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.

    Abstract translation: 本发明的一个实施例是包括一个或多个流式多处理器(SM)并且实现每SM的重放单元的并行处理单元(PPU)。 当检测到与由特定SM发出的存储器事务相关联的页面错误时,相应的重放单元使得SM,而不是任何未受影响的SM停止发行新的存储器事务。 重播单元然后将故障存储器事务和任何故障的飞行中存储器事务存储在重放缓冲器中。 当页面错误得到解决时,重播单元重播重播缓冲区中的内存事务,从重播缓冲区中移除成功的内存事务,直到所有存储的内存事务都已成功执行。 有利的是,与常规PPU相比,PPU的整体性能得到改善,在常规PPU检测到页面故障之后,停止执行包含在PPU中的所有SM的存储器事务,直到故障被解决为止。

Patent Agency Ranking