-
公开(公告)号:US20220101091A1
公开(公告)日:2022-03-31
申请号:US17550405
申请日:2021-12-14
Applicant: Intel Corporation
Inventor: Srivatsa Rangachar Srinivasa , Jainaveen Sundaram Priya , Bradley A. Jackson , Ambili Vengallur , Dileep John Kurian , Tanay Karnik
Abstract: A DNN accelerator includes a multiplication controller controlling whether to perform matrix computation based on weight values. The multiplication controller reads a weight matrix from a WRAM in the DNN accelerator and determines a row value for a row in the weight matrix. In an embodiment where the row value is one, a first switch sends a read request to the WRAM to read weights in the row and a second switch forms a data transmission path from an IRAM in the DNN accelerator to a PE in the DNN accelerator. The PE receives the weights and input data stored in the IRAM and performs MAC operations. In an embodiment where the row value is zero, the first and second switches are not triggered. No read request is sent to the WRAM and the data transmission path is not formed. The PE will not perform any MAC operations.
-
2.
公开(公告)号:US20220319162A1
公开(公告)日:2022-10-06
申请号:US17845732
申请日:2022-06-21
Applicant: Intel Corporation
Inventor: Srivatsa Rangachar Srinivasa , Tanay Karnik , Dileep Kurian , Ranganath Krishnan , Jainaveen Sundaram Priya , Indranil Chakraborty
Abstract: Methods, apparatus, systems, and articles of manufacture providing a Bayesian compute unit with reconfigurable sampler and methods and apparatus to operate the same are disclosed. An example apparatus includes a number generator to generate a sequence of numbers; a multiplier to generate a plurality of products by multiplying respective numbers of the sequence of the numbers by a variance value; and an adder to generate a plurality of weights by adding a mean value to the plurality of products, the plurality of weights corresponding to a single probability distribution.
-
公开(公告)号:US20210385463A1
公开(公告)日:2021-12-09
申请号:US17407384
申请日:2021-08-20
Applicant: Intel Corporation
Inventor: Palanivel Guruva reddiar , Praveen P. Nair , Shabbir Abbasali Saifee , Vikas Ahuja , Arshad Mehmood , Jainaveen Sundaram Priya
IPC: H04N19/142 , H04N19/172 , H04N19/139 , H04N19/176 , G06K9/00 , G06T7/11 , H04N19/124 , H04N19/107 , G06T7/246 , G06N3/04
Abstract: In one embodiment, a compute device includes interface circuitry and processing circuitry. The processing circuitry receives, via the interface circuitry, a current frame of a video stream to be encoded. The processing circuitry then determines whether a scene change occurs at the current frame. If a scene change occurs at the current frame, the processing circuitry detects the scene in the current frame by performing pixel segmentation on the current frame. If a scene change does not occur at the current frame, the processing circuitry detects the scene in the current frame by performing motion estimation on the current frame relative to a previous frame in which the scene was detected. Based on the scene detected in the current frame, the processing circuitry then generates one or more encoding parameters and provides those parameters to a video encoder to encode the current frame.
-
公开(公告)号:US20240320490A1
公开(公告)日:2024-09-26
申请号:US18734487
申请日:2024-06-05
Applicant: Intel Corporation
Inventor: Jainaveen Sundaram Priya , Prerna Budhkar , Vui Seng Chua , Srivatsa Rangachar Srinivasa , Tanay Karnik
Abstract: A modified 2-pass version of the SoftMax operation can be implemented to address reduce computational cost without loss of accuracy, in particular for deep learning neural networks such as transformer-based neural networks and large language models (LLMs). The first pass is modified to include two scalar operations at the end. At the end of the first pass, a first scalar operation is performed to calculate a logarithm of the denominator, and a second scalar operation is performed to calculate an operand value based on a sum of the logarithm of the denominator and the maximum value. The second pass is modified to perform addition and exponentiation. In the second pass, an element of an input tensor is subtracted by the operand value to obtain an exponent, and a base is raised to the exponent. The second pass avoids divisions.
-
-
-