Techniques for efficiently performing data reductions in parallel processing units

    公开(公告)号:US11061741B2

    公开(公告)日:2021-07-13

    申请号:US16513393

    申请日:2019-07-16

    Abstract: Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

Patent Agency Ranking