-
公开(公告)号:US20230116629A1
公开(公告)日:2023-04-13
申请号:US18046256
申请日:2022-10-13
Applicant: Intel Corporation
Inventor: Martin-Thomas Grymel , David Thomas Bernard , Niall Hanrahan
Abstract: A DNN accelerator includes multiple compute tiles for sharing a workload of running a convolution. A halo pipeline in a compute tile can facilitate replications of halo data from the compute tile where the halo data is generated into another compute tile. The halo pipeline may receive a memory transaction for writing a data block. The halo pipeline may determine that the data block falls into a halo region in an input tensor of the convolution. The halo pipeline may generate a remote address for storing the data block in a memory of the other compute tile, e.g., based on a local address of the data block in a memory of the compute tile. The halo pipeline may adjust the remote address, e.g., based on a difference in dimensions of a tensor to be used by the compute tile and a tensor to be used by the other compute tile.
-
公开(公告)号:US20230059976A1
公开(公告)日:2023-02-23
申请号:US18047415
申请日:2022-10-18
Applicant: Intel Corporation
Inventor: Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , Martin Power , Umer Iftikhar Cheema , David Thomas Bernard
IPC: G06N3/08
Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.
-