-
公开(公告)号:US11922178B2
公开(公告)日:2024-03-05
申请号:US17359392
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick
CPC classification number: G06F9/445 , G06F9/3001 , G06F9/5027 , G06N20/00 , H03K19/177 , H03K19/20
Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
-
12.
公开(公告)号:US20240036763A1
公开(公告)日:2024-02-01
申请号:US18465560
申请日:2023-09-12
Applicant: Intel Corporation
Inventor: Niall Hanrahan , Martin Power , Kevin Brady , Martin-Thomas Grymel , David Bernard , Gary Baugh , Cormac Brick
CPC classification number: G06F3/0656 , G06F7/5443 , G06F3/0625 , G06F3/0679 , G06F3/0613
Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
-
13.
公开(公告)号:US20240022259A1
公开(公告)日:2024-01-18
申请号:US18465495
申请日:2023-09-12
Applicant: Intel Corporation
Inventor: Gautham Chinya , Debabrata Mohapatra , Arnab Raha , Huichu Liu , Cormac Brick
CPC classification number: H03M7/3082 , G06F16/2237 , G06N3/063 , G06N3/08
Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
-
公开(公告)号:US12242861B2
公开(公告)日:2025-03-04
申请号:US18416303
申请日:2024-01-18
Applicant: Intel Corporation
Inventor: Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick
Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
-
公开(公告)号:US12169643B2
公开(公告)日:2024-12-17
申请号:US18465560
申请日:2023-09-12
Applicant: Intel Corporation
Inventor: Niall Hanrahan , Martin Power , Kevin Brady , Martin-Thomas Grymel , David Bernard , Gary Baugh , Cormac Brick
Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.
-
公开(公告)号:US20240134786A1
公开(公告)日:2024-04-25
申请号:US18539955
申请日:2023-12-14
Applicant: Intel Corporation
Inventor: Martin-Thomas Grymel , David Bernard , Niall Hanrahan , Martin Power , Kevin Brady , Gary Baugh , Cormac Brick
CPC classification number: G06F12/0207 , G06F12/0292 , G06N3/10
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
-
17.
公开(公告)号:US11347828B2
公开(公告)日:2022-05-31
申请号:US16833210
申请日:2020-03-27
Applicant: Intel Corporation
Inventor: Biji George , Om Ji Omer , Dipan Kumar Mandal , Cormac Brick , Lance Hacking , Sreenivas Subramoney , Belliappa Kuttanna
IPC: G06F17/16
Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
-
公开(公告)号:US20220067524A1
公开(公告)日:2022-03-03
申请号:US17524333
申请日:2021-11-11
Applicant: Intel Corporation
Inventor: Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick
Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.
-
公开(公告)号:US20210406164A1
公开(公告)日:2021-12-30
申请号:US17359217
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Martin-Thomas Grymel , David Bernard , Niall Hanrahan , Martin Power , Kevin Brady , Gary Baugh , Cormac Brick
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.
-
公开(公告)号:US20210042617A1
公开(公告)日:2021-02-11
申请号:US17081509
申请日:2020-10-27
Applicant: Intel Corporation
Inventor: Gautham Chinya , Deepak Mathaikutty , Guruguhanathan Venkataramanan , Debabrata Mohapatra , Moongon Jung , Sang Kyun Kim , Arnab Raha , Cormac Brick
Abstract: Systems, apparatuses and methods may provide for technology that identify an assignment of weights of a workload to a plurality of processing elements, where the workload is to be associated with a neural network. The technology generates a representation that is to represent whether each of the weights is a zero value or a non-zero value. The technology further stores the representation into partitions of a storage structure based on the assignment of the weights, where the partitions are each to be dedicated to a different one of the processing elements.
-
-
-
-
-
-
-
-
-