-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC分类号: G06F17/16 , G06F7/5443 , G06T1/20
摘要: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
公开(公告)号:US20230342111A1
公开(公告)日:2023-10-26
申请号:US18216797
申请日:2023-06-30
申请人: Intel Corporation
发明人: Martin Langhammer , Dongdong Chen , Kevin Hurd
CPC分类号: G06F7/4876 , G06F7/5443 , G06F2207/3824 , G06F2207/382
摘要: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
-
公开(公告)号:US20220012015A1
公开(公告)日:2022-01-13
申请号:US17484845
申请日:2021-09-24
申请人: Intel Corporation
发明人: Martin Langhammer , Dongdong Chen , Kevin Hurd
摘要: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
-
4.
公开(公告)号:US20240103810A1
公开(公告)日:2024-03-28
申请号:US17935787
申请日:2022-09-27
申请人: Intel Corporation
发明人: Jiasheng Chen , Supratim Pal , Changwon Rhee , Hong Jiang , Kevin Hurd , Shuai Mu
CPC分类号: G06F7/5443 , G06F7/57 , G06F17/16
摘要: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.
-
公开(公告)号:US20190155574A1
公开(公告)日:2019-05-23
申请号:US16144904
申请日:2018-09-27
申请人: Intel Corporation
发明人: Martin Langhammer , Dongdong Chen , Kevin Hurd
摘要: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
-
公开(公告)号:US11726744B2
公开(公告)日:2023-08-15
申请号:US17214526
申请日:2021-03-26
申请人: Intel Corporation
发明人: Martin Langhammer , Dongdong Chen , Kevin Hurd
CPC分类号: G06F7/4876 , G06F7/5443 , G06F2207/382 , G06F2207/3824
摘要: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
-
公开(公告)号:US20240111825A1
公开(公告)日:2024-04-04
申请号:US17937229
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Changwon Rhee , Kevin Hurd , Gregory Henry , Peter Caday , Kristopher Wong
摘要: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.
-
公开(公告)号:US20210240440A1
公开(公告)日:2021-08-05
申请号:US17214526
申请日:2021-03-26
申请人: Intel Corporation
发明人: Martin Langhammer , Dongdong Chen , Kevin Hurd
摘要: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
-
公开(公告)号:US10970042B2
公开(公告)日:2021-04-06
申请号:US16144904
申请日:2018-09-27
申请人: Intel Corporation
发明人: Martin Langhammer , Dongdong Chen , Kevin Hurd
摘要: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
-
-
-
-
-
-
-
-