-
公开(公告)号:US20240152575A1
公开(公告)日:2024-05-09
申请号:US18414901
申请日:2024-01-17
Applicant: Meta Platforms Technologies, LLC
Inventor: Alagappan Valliappan , Pierce I-Jen Chuang , Ganesh Venkatesh
IPC: G06F17/16
CPC classification number: G06F17/16 , G06F7/5443
Abstract: Disclosed herein includes a system, a method, and a device for processing and converting data using matrix operations. Circuitry can partition an input of a first data format across a plurality of lookup tables each residing in a respective memory. The circuitry can access weight information from a load store memory, and the partitioned input on a per column basis from the plurality of lookup tables. The circuitry can perform a number of multiply-accumulate (MAC) operations per cycle between the weight information from the load store memory and the partitioned input read on a per column basis from the plurality of lookup tables. The number of MAC operations performed per cycle can correspond to a total number of columns of the plurality of lookup tables. The circuitry can generate, responsive to the MAC operations on the partitioned input, a plurality of outputs in a second data format.
-
公开(公告)号:US11429394B2
公开(公告)日:2022-08-30
申请号:US16997460
申请日:2020-08-19
Applicant: Meta Platforms Technologies, LLC
Inventor: Alagappan Valliappan , Ganesh Venkatesh , Pierce I-Jen Chuang
Abstract: Disclosed herein includes improving computational efficiency of multiply-accumulate (MAC) operation. In one aspect, a computing device identifies, a first vector including non-zero elements of a base matrix, and a second vector indicating a location of each of the non-zero elements of the base matrix. In one aspect, the device determines a first element and a second element of the first vector. In one aspect, the device determines a third element and a fourth element of the second vector. In one aspect, the device determines i) a fifth element of an input vector according to the third element of the second vector, and ii) a sixth element of the input vector according to the fourth element of the second vector. In one aspect, the device causes a MAC circuitry to perform a dot product according to the first element, the second element, the fifth element, and the sixth element.
-
公开(公告)号:US20240220259A1
公开(公告)日:2024-07-04
申请号:US18525083
申请日:2023-11-30
Applicant: Meta Platforms Technologies, LLC
Inventor: Tomonari Tohara , Vignesh Vivekraja , Alagappan Valliappan , Andrey Bushev , Javid Jaffari
IPC: G06F9/30
CPC classification number: G06F9/30178 , G06F9/30038 , G06F9/30134
Abstract: In one embodiment, a computing system may set data to a first group of registers. The first group of registers may be configured to be accessed during a single operation cycle. The system may set a number of patterns to a second group of registers. Each pattern of the number of patterns may include an array of index for the data stored in the first group of registers. The system may select, for a first vector register associated with a vector engine, a first pattern from the patterns stored in the second group of registers. The system may load a first portion of the data from the first group of registers to the first vector register based on the first pattern selected for the first vector register from the patterns stored in the second group of registers.
-
公开(公告)号:US11899745B1
公开(公告)日:2024-02-13
申请号:US16997401
申请日:2020-08-19
Applicant: Meta Platforms Technologies, LLC
Inventor: Alagappan Valliappan , Ganesh Venkatesh , Pierce I-Jen Chuang
CPC classification number: G06F17/16 , G06F7/5443
Abstract: Disclosed herein includes a system, a method, and a device for processing and converting data using matrix operations. Circuitry can partition an input of a first data format across a plurality of lookup tables each residing in a respective memory. The circuitry can access weight information from a load store memory, and the partitioned input on a per column basis from the plurality of lookup tables. The circuitry can perform a number of multiply-accumulate (MAC) operations per cycle between the weight information from the load store memory and the partitioned input read on a per column basis from the plurality of lookup tables. The number of MAC operations performed per cycle can correspond to a total number of columns of the plurality of lookup tables. The circuitry can generate, responsive to the MAC operations on the partitioned input, a plurality of outputs in a second data format.
-
-
-