-
公开(公告)号:US11922178B2
公开(公告)日:2024-03-05
申请号:US17359392
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick
CPC classification number: G06F9/445 , G06F9/3001 , G06F9/5027 , G06N20/00 , H03K19/177 , H03K19/20
Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
-
22.
公开(公告)号:US20240022259A1
公开(公告)日:2024-01-18
申请号:US18465495
申请日:2023-09-12
Applicant: Intel Corporation
Inventor: Gautham Chinya , Debabrata Mohapatra , Arnab Raha , Huichu Liu , Cormac Brick
CPC classification number: H03M7/3082 , G06F16/2237 , G06N3/063 , G06N3/08
Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.
-
公开(公告)号:US20220292366A1
公开(公告)日:2022-09-15
申请号:US17709337
申请日:2022-03-30
Applicant: Intel Corporation
Inventor: Arnab Raha , Martin Langhammer , Debabrata Mohapatra , Nihat Tunali , Michael Wu
Abstract: Methods, apparatus, systems, and articles of manufacture to perform low overhead sparsity acceleration logic for multi-precision dataflow in deep neural network accelerators are disclosed. An example apparatus includes a first buffer to store data corresponding to a first precision; a second buffer to store data corresponding to a second precision; and hardware control circuitry to: process a first multibit bitmap to determine an activation precision of an activation value, the first multibit bitmap including values corresponding to different precisions; process a second multibit bitmap to determine a weight precision of a weight value, the second multibit bitmap including values corresponding to different precisions; and store the activation value and the weight value in the second buffer when at least one of the activation precision or the weight precision corresponds to the second precision.
-
公开(公告)号:US20220075659A1
公开(公告)日:2022-03-10
申请号:US17530156
申请日:2021-11-18
Applicant: Intel Corporation
Inventor: Debabrata Mohapatra , Arnab Raha , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Cormac Michael Brick
Abstract: There is disclosed a system and method of performing an artificial intelligence (AI) inference, including: programming an AI accelerator circuit to solve an AI problem with a plurality of layer-specific register file (RF) size allocations, wherein the AI accelerator circuit comprises processing elements (PEs) with respective associated RFs, wherein the RFs individually are divided into K sub-banks of size B bytes, wherein B and K are integers, and wherein the RFs include circuitry to individually allocate a sub-bank to one of input feature (IF), output feature (OF), or filter weight (FL), and wherein programming the plurality of layer-specific RF size allocations comprises accounting for sparse data within the layer; and causing the AI accelerator circuit to execute the AI problem, including applying the layer-specific RF size allocations at run-time.
-
公开(公告)号:US11010166B2
公开(公告)日:2021-05-18
申请号:US15087854
申请日:2016-03-31
Applicant: Intel Corporation
Inventor: Debabrata Mohapatra , Perry H. Wang , Xiang Zou , Sang Kyun Kim , Deepak A. Mathaikutty , Gautham N. Chinya
Abstract: A processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction, and an allocator including circuitry to assign the second instruction to the execution unit to execute the second instruction. The execution unit includes circuitry to select between a normal computation and an accelerated computation based on a mode field of the performance register, perform the selected computation, and select between a normal result associated with the normal computation and an accelerated result associated with the accelerated computation based on the mode field.
-
公开(公告)号:US20170286117A1
公开(公告)日:2017-10-05
申请号:US15087854
申请日:2016-03-31
Applicant: Intel Corporation
Inventor: Debabrata Mohapatra , Perry H. Wang , Xiang Zou , Sang Kyun Kim , Deepak A. Mathaikutty , Gautham N. Chinya
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30076 , G06F9/30101 , G06F9/30123 , G06F9/30189 , G06F9/3836 , G06F9/3873
Abstract: A processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction and an allocator including circuitry to assign the second instruction to the execution unit to execute the second instruction. The execution unit includes circuitry to select between a normal computation and an accelerated computation based on a mode field of the performance register, perform the selected computation, and select between a normal result associated with the normal computation and an accelerated result associated with the accelerated computation based on the mode field.
-
-
-
-
-