-
公开(公告)号:US20210117197A1
公开(公告)日:2021-04-22
申请号:US17132895
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Steven Hsu , Amit Agarwal , Debabrata Mohapatra , Arnab Raha , Moongon Jung , Gautham Chinya , Ram Krishnamurthy
Abstract: Systems, apparatuses and methods identify a plurality of registers that are associated with a system-on-chip. The plurality of registers includes a first portion dedicated to write operations and a second portion dedicated to read operations. The technology writes data to the first portion of the plurality of registers, and transfers the data from the first portion to the second portion.
-
公开(公告)号:US12242861B2
公开(公告)日:2025-03-04
申请号:US18416303
申请日:2024-01-18
Applicant: Intel Corporation
Inventor: Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick
Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.
-
公开(公告)号:US20230376274A1
公开(公告)日:2023-11-23
申请号:US18362529
申请日:2023-07-31
Applicant: Intel Corporation
Inventor: Mark Anders , Arnab Raha , Amit Agarwal , Steven Hsu , Deepak Abraham Mathaikutty , Ram K. Krishnamurthy , Martin Power
CPC classification number: G06F7/5443 , G06F7/4876 , G06F7/485 , G06F5/012
Abstract: A fused dot-product multiply-accumulate (MAC) circuit may support variable precisions of floating-point data elements to perform computations (e.g., MAC operations) in deep learning operations. An operation mode of the circuit may be selected based on the precision of an input element. The operation mode may be a FP16 mode or a FP8 mode. In the FP8 mode, product exponents may be computed based on exponents of floating-point input elements. A maximum exponent may be selected from the one or more product exponents. A global maximum exponent may be selected from a plurality of maximum exponents. A product mantissa may be computed and aligned with another product mantissa based on a difference between the global maximum exponent and a corresponding maximum exponent. An adder tree may accumulate the aligned product mantissas and compute a partial sum mantissa. The partial sum mantissa may be normalized using the global maximum exponent.
-
公开(公告)号:US20230059976A1
公开(公告)日:2023-02-23
申请号:US18047415
申请日:2022-10-18
Applicant: Intel Corporation
Inventor: Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , Martin Power , Umer Iftikhar Cheema , David Thomas Bernard
IPC: G06N3/08
Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.
-
公开(公告)号:US20220261623A1
公开(公告)日:2022-08-18
申请号:US17733692
申请日:2022-04-29
Applicant: Intel Corporation
Inventor: Raymond Jit-Hung Sung , Debabrata Mohapatra , Arnab Raha , Deepak Abraham Mathaikutty , Praveen Kumar Gupta
Abstract: An DNN accelerator includes a column of PEs and an external adder assembly for performing depthwise convolution. Each PE includes register files, multipliers, and an internal adder assembly. Each register file can store an operand (input operand, weight operand, etc.) of the depthwise convolution. The operand includes a sequence of elements, each of which corresponds to a different depthwise channel. A multiplier can perform a sequence of multiplications on two operands, e.g., an input operand and a weight operand, and generate a product operand. The internal adder assembly can accumulate product operands and generate an output operand of the PE. The output operand includes output elements, each of which corresponds to a different depthwise channel. The operands may be reused in different rounds of operations by the multipliers. The external adder assembly can accumulate output operands of multiple PEs and generate an output operand of the PE column.
-
16.
公开(公告)号:US20220083843A1
公开(公告)日:2022-03-17
申请号:US17534976
申请日:2021-11-24
Applicant: Intel Corporation
Inventor: Arnab Raha , Debabrata Mohapatra , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Cormac Michael Brick
Abstract: An apparatus is provided to access a weight vector of a layer in a sequence of layers in the DNN. The weight vector includes a first sequence of weights having different values. A bitmap is generated based on the weight vector. The bitmap includes a second sequence of bitmap elements. Each bitmap element corresponds to a different weight and has a value determined based at least on the value of the corresponding weight. The index of each bitmap element in the second sequence matches the index of the corresponding weight in the first sequence. A new bitmap is generated by rearranging the bitmap elements in the second sequence based on the values of the bitmap elements. The weight vector is rearranged based on the new bitmap. The rearranged weight vector is divided into subsets, each of which is assigned to a different PE for a MAC operation.
-
公开(公告)号:US20220067524A1
公开(公告)日:2022-03-03
申请号:US17524333
申请日:2021-11-11
Applicant: Intel Corporation
Inventor: Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick
Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.
-
公开(公告)号:US20210042617A1
公开(公告)日:2021-02-11
申请号:US17081509
申请日:2020-10-27
Applicant: Intel Corporation
Inventor: Gautham Chinya , Deepak Mathaikutty , Guruguhanathan Venkataramanan , Debabrata Mohapatra , Moongon Jung , Sang Kyun Kim , Arnab Raha , Cormac Brick
Abstract: Systems, apparatuses and methods may provide for technology that identify an assignment of weights of a workload to a plurality of processing elements, where the workload is to be associated with a neural network. The technology generates a representation that is to represent whether each of the weights is a zero value or a non-zero value. The technology further stores the representation into partitions of a storage structure based on the assignment of the weights, where the partitions are each to be dedicated to a different one of the processing elements.
-
公开(公告)号:US10671744B2
公开(公告)日:2020-06-02
申请号:US15190396
申请日:2016-06-23
Applicant: INTEL CORPORATION
Inventor: Li Zhao , Manoj R. Sastry , Arnab Raha
IPC: G06F21/62
Abstract: Lightweight trusted execution technologies for internet-of-things devices are described. In response to a memory request at a page unit from an application executing in a current domain, the page unit is to map a current virtual address (VA) to a current physical address (PA). The policy enforcement logic (PEL) reads, from a secure domain cache (SDC), a domain value (DID) and a VA value that correspond to the current PA. The PEL grants access when the current domain and the DID correspond to the unprotected region or the current domain and the DID correspond to the secure domain region, the current domain is equal to the DID, and the current VA is equal to the VA value. The PEL grants data access and denies code access when the current domain corresponds to the secure domain region and the DID corresponds to the unprotected region.
-
公开(公告)号:US20240220785A1
公开(公告)日:2024-07-04
申请号:US18408716
申请日:2024-01-10
Applicant: Intel Corporation
Inventor: Gautham Chinya , Huichu Liu , Arnab Raha , Debabrata Mohapatra , Cormac Brick , Lance Hacking
CPC classification number: G06N3/063 , G06F9/3814 , G06F9/3877 , G06F9/4498 , G06F9/5027 , G06N5/04
Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
-
-
-
-
-
-
-
-
-