-
公开(公告)号:US11714998B2
公开(公告)日:2023-08-01
申请号:US16909295
申请日:2020-06-23
Applicant: Intel Corporation
Inventor: Avishaii Abuhatzera , Om Ji Omer , Ritwika Chowdhury , Lance Hacking
CPC classification number: G06N3/063 , G06N3/0454 , G06N3/088
Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
-
公开(公告)号:US12112171B2
公开(公告)日:2024-10-08
申请号:US17134367
申请日:2020-12-26
Applicant: Intel Corporation
Inventor: Anant Nori , Shankar Balachandran , Sreenivas Subramoney , Joydeep Rakshit , Vedvyas Shanbhogue , Avishaii Abuhatzera , Belliappa Kuttanna
CPC classification number: G06F9/30145 , G06F9/30065 , G06F9/3836 , G06F9/4881
Abstract: Techniques for processing loops are described. An exemplary apparatus at least includes decoder circuitry to decode a single instruction, the single instruction to include a field for an opcode, the opcode to indicate execution circuitry is to perform an operation to configure execution of one or more loops, wherein the one or more loops are to include a plurality of configuration instructions and instructions that are to use metadata generated by ones of the plurality of configuration instructions; and execution circuitry to perform the operation as indicated by the opcode.
-
3.
公开(公告)号:US20200320375A1
公开(公告)日:2020-10-08
申请号:US16909295
申请日:2020-06-23
Applicant: Intel Corporation
Inventor: Avishaii Abuhatzera , Om Ji Omer , Ritwika Chowdhury , Lance Hacking
Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
-
4.
公开(公告)号:US20240005135A1
公开(公告)日:2024-01-04
申请号:US18135958
申请日:2023-04-18
Applicant: Intel Corporation
Inventor: Avishaii Abuhatzera , Om Ji Omer , Ritwika Chowdhury , Lance Hacking
Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
-
公开(公告)号:US20220113974A1
公开(公告)日:2022-04-14
申请号:US17561029
申请日:2021-12-23
Applicant: INTEL CORPORATION
Inventor: Om Ji Omer , Gurpreet Singh Kalsi , Anirud Thyagharajan , Saurabh Jain , Kamlesh R. Pillai , Sreenivas Subramoney , Avishaii Abuhatzera
Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
-
-
-
-