-
1.
公开(公告)号:US20230195685A1
公开(公告)日:2023-06-22
申请号:US18170900
申请日:2023-02-17
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh
IPC分类号: G06F15/78 , G06F9/30 , G06F12/128 , G06F17/16 , G06F12/0811 , G06F12/02 , G06F12/0866 , G06F7/544 , G06F9/50 , G06F17/18 , G06F9/38 , G06F12/0891 , G06F12/06 , G06F12/0888 , G06F12/0802 , G06T1/60 , G06F12/0871 , G06T1/20 , H03M7/46 , G06F12/0875 , G06F12/0862 , G06F15/80 , G06F12/0897 , G06F12/0893 , G06F12/0804 , G06F12/0882 , G06F7/575 , G06F12/1009 , G06F12/0895 , G06F7/58 , G06T15/06 , G06N3/08
CPC分类号: G06F15/7839 , G06F9/30043 , G06F12/128 , G06F17/16 , G06F12/0811 , G06F12/0238 , G06F12/0866 , G06F9/30014 , G06F7/5443 , G06F9/5077 , G06F12/0246 , G06F17/18 , G06F9/3887 , G06F12/0891 , G06F12/0607 , G06F12/0888 , G06F12/0802 , G06T1/60 , G06F9/30079 , G06F12/0871 , G06F9/30036 , G06T1/20 , H03M7/46 , G06F12/0215 , G06F12/0875 , G06F12/0862 , G06F15/8046 , G06F9/30047 , G06F9/30065 , G06F12/0897 , G06F9/5011 , G06F12/0893 , G06F12/0804 , G06F12/0882 , G06F9/3001 , G06F7/575 , G06F12/1009 , G06F9/3004 , G06F12/0895 , G06F7/588 , G06F2212/401 , G06F2212/1044 , G06F9/3867 , G06F9/3818 , G06F9/3802 , G06F2212/455 , G06F2212/1021 , G06F2212/60 , G06F2212/1008 , G06T15/06 , G06N3/08 , G06F2212/302
摘要: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.
-
2.
公开(公告)号:US11182337B1
公开(公告)日:2021-11-23
申请号:US16900236
申请日:2020-06-12
申请人: Intel Corporation
摘要: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.
-
公开(公告)号:US20210081201A1
公开(公告)日:2021-03-18
申请号:US17107823
申请日:2020-11-30
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi
摘要: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
-
4.
公开(公告)号:US12093213B2
公开(公告)日:2024-09-17
申请号:US18310129
申请日:2023-05-01
申请人: Intel Corporation
CPC分类号: G06F15/8046 , G06F15/8007 , G06F17/16 , G06N20/00
摘要: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.
-
公开(公告)号:US12007935B2
公开(公告)日:2024-06-11
申请号:US17428523
申请日:2020-03-14
申请人: INTEL CORPORATION
发明人: Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh
IPC分类号: G06F9/30 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and
a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.-
公开(公告)号:US11954063B2
公开(公告)日:2024-04-09
申请号:US18170900
申请日:2023-02-17
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh
IPC分类号: G06T15/06 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.
-
公开(公告)号:US11204977B2
公开(公告)日:2021-12-21
申请号:US16913800
申请日:2020-06-26
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Ashutosh Garg , Shubra Marwaha , Chandra Gurram , Darin Starkey , Durgesh Borkar , Varghese George
摘要: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
公开(公告)号:US20240320000A1
公开(公告)日:2024-09-26
申请号:US18621539
申请日:2024-03-29
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi
CPC分类号: G06F9/30036 , G06F9/3001 , G06F9/30101 , G06F9/3893 , G06F15/8046
摘要: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
-
公开(公告)号:US11709793B2
公开(公告)日:2023-07-25
申请号:US17827067
申请日:2022-05-27
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh
IPC分类号: G06T15/06 , G06F9/30 , G06F15/78 , G06F9/38 , G06F17/18 , G06F12/0802 , G06F7/544 , G06F7/575 , G06F12/02 , G06F12/0866 , G06F12/0875 , G06F12/0895 , G06F12/128 , G06F12/06 , G06F12/1009 , G06T1/20 , G06T1/60 , H03M7/46 , G06F12/0811 , G06F15/80 , G06F17/16 , G06F7/58 , G06F12/0871 , G06F12/0862 , G06F12/0897 , G06F9/50 , G06F12/0804 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0888 , G06N3/08
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/3004 , G06F9/30014 , G06F9/30036 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
-
10.
公开(公告)号:US11669490B2
公开(公告)日:2023-06-06
申请号:US17518202
申请日:2021-11-03
申请人: Intel Corporation
CPC分类号: G06F15/8046 , G06F15/8007 , G06F17/16 , G06N20/00
摘要: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.
-
-
-
-
-
-
-
-
-