-
公开(公告)号:US20210397414A1
公开(公告)日:2021-12-23
申请号:US17358868
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Arnab Raha , Mark A. Anders , Martin Power , Martin Langhammer , Himanshu Kaul , Debabrata Mohapatra , Gautham Chinya , Cormac Brick , Ram Krishnamurthy
Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.
-
公开(公告)号:US12169643B2
公开(公告)日:2024-12-17
申请号:US18465560
申请日:2023-09-12
Applicant: Intel Corporation
Inventor: Niall Hanrahan , Martin Power , Kevin Brady , Martin-Thomas Grymel , David Bernard , Gary Baugh , Cormac Brick
Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
-
公开(公告)号:US20240134786A1
公开(公告)日:2024-04-25
申请号:US18539955
申请日:2023-12-14
Applicant: Intel Corporation
Inventor: Martin-Thomas Grymel , David Bernard , Niall Hanrahan , Martin Power , Kevin Brady , Gary Baugh , Cormac Brick
CPC classification number: G06F12/0207 , G06F12/0292 , G06N3/10
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
-
公开(公告)号:US20230376274A1
公开(公告)日:2023-11-23
申请号:US18362529
申请日:2023-07-31
Applicant: Intel Corporation
Inventor: Mark Anders , Arnab Raha , Amit Agarwal , Steven Hsu , Deepak Abraham Mathaikutty , Ram K. Krishnamurthy , Martin Power
CPC classification number: G06F7/5443 , G06F7/4876 , G06F7/485 , G06F5/012
Abstract: A fused dot-product multiply-accumulate (MAC) circuit may support variable precisions of floating-point data elements to perform computations (e.g., MAC operations) in deep learning operations. An operation mode of the circuit may be selected based on the precision of an input element. The operation mode may be a FP16 mode or a FP8 mode. In the FP8 mode, product exponents may be computed based on exponents of floating-point input elements. A maximum exponent may be selected from the one or more product exponents. A global maximum exponent may be selected from a plurality of maximum exponents. A product mantissa may be computed and aligned with another product mantissa based on a difference between the global maximum exponent and a corresponding maximum exponent. An adder tree may accumulate the aligned product mantissas and compute a partial sum mantissa. The partial sum mantissa may be normalized using the global maximum exponent.
-
公开(公告)号:US20230059976A1
公开(公告)日:2023-02-23
申请号:US18047415
申请日:2022-10-18
Applicant: Intel Corporation
Inventor: Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , Martin Power , Umer Iftikhar Cheema , David Thomas Bernard
IPC: G06N3/08
Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.
-
公开(公告)号:US20210406164A1
公开(公告)日:2021-12-30
申请号:US17359217
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Martin-Thomas Grymel , David Bernard , Niall Hanrahan , Martin Power , Kevin Brady , Gary Baugh , Cormac Brick
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
-
17.
公开(公告)号:US20210319317A1
公开(公告)日:2021-10-14
申请号:US17357924
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Martin Power , Kevin Brady , Niall Hanrahan , Martin-Thomas Grymel , David Bernard , Gary Baugh
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to perform machine-learning model operations on sparse accelerators. An example apparatus includes first circuitry, second circuitry to generate sparsity data based on an acceleration operation, and third circuitry to instruct one or more data buffers to provide at least one of activation data or weight data based on the sparsity data to the first circuitry, the first circuitry to execute the acceleration operation based on the at least one of the activation data or the weight data.
-
-
-
-
-
-