Fused convolution and batch normalization for neural networks

    公开(公告)号:US11573765B2

    公开(公告)日:2023-02-07

    申请号:US16219154

    申请日:2018-12-13

    摘要: A processing unit implements a convolutional neural network (CNN) by fusing at least a portion of a convolution phase of the CNN with at least a portion of a batch normalization phase. The processing unit convolves two input matrices representing inputs and weights of a portion of the CNN to generate an output matrix. The processing unit performs the convolution via a series of multiplication operations, with each multiplication operation generating a corresponding submatrix (or “tile”) of the output matrix at an output register of the processing unit. While an output submatrix is stored at the output register, the processing unit performs a reduction phase and an update phase of the batch normalization phase for the CNN. The processing unit thus fuses at least a portion of the batch normalization phase of the CNN with a portion of the convolution.

    Multi-accelerator compute dispatch

    公开(公告)号:US11790590B2

    公开(公告)日:2023-10-17

    申请号:US17218421

    申请日:2021-03-31

    IPC分类号: G06T15/00 G06F9/54 G06T15/80

    摘要: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

    MULTI-ACCELERATOR COMPUTE DISPATCH
    3.
    发明公开

    公开(公告)号:US20240029336A1

    公开(公告)日:2024-01-25

    申请号:US18480466

    申请日:2023-10-03

    IPC分类号: G06T15/00 G06F9/54 G06T15/80

    摘要: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

    MULTI-ACCELERATOR COMPUTE DISPATCH

    公开(公告)号:US20220319089A1

    公开(公告)日:2022-10-06

    申请号:US17218421

    申请日:2021-03-31

    IPC分类号: G06T15/00 G06T15/80 G06F9/54

    摘要: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.