-
公开(公告)号:US11573765B2
公开(公告)日:2023-02-07
申请号:US16219154
申请日:2018-12-13
发明人: Milind N. Nemlekar , Prerit Dak
摘要: A processing unit implements a convolutional neural network (CNN) by fusing at least a portion of a convolution phase of the CNN with at least a portion of a batch normalization phase. The processing unit convolves two input matrices representing inputs and weights of a portion of the CNN to generate an output matrix. The processing unit performs the convolution via a series of multiplication operations, with each multiplication operation generating a corresponding submatrix (or “tile”) of the output matrix at an output register of the processing unit. While an output submatrix is stored at the output register, the processing unit performs a reduction phase and an update phase of the batch normalization phase for the CNN. The processing unit thus fuses at least a portion of the batch normalization phase of the CNN with a portion of the convolution.
-
公开(公告)号:US11790590B2
公开(公告)日:2023-10-17
申请号:US17218421
申请日:2021-03-31
发明人: Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak
CPC分类号: G06T15/005 , G06F9/545 , G06T15/80
摘要: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
-
公开(公告)号:US20240029336A1
公开(公告)日:2024-01-25
申请号:US18480466
申请日:2023-10-03
发明人: Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak
CPC分类号: G06T15/005 , G06F9/545 , G06T15/80
摘要: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
-
公开(公告)号:US20220319089A1
公开(公告)日:2022-10-06
申请号:US17218421
申请日:2021-03-31
发明人: Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak
摘要: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
-
-
-