Systems and methods for reading and writing sparse data in a neural network accelerator

    公开(公告)号:US11954025B2

    公开(公告)日:2024-04-09

    申请号:US18126228

    申请日:2023-03-24

    CPC classification number: G06F12/04 G06F2212/1028 G06N3/08

    Abstract: Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A mask identifying byte positions within a data word having non-zero values in memory can be accessed. Each bit of the mask can have a first value or a second value, the first value indicating that a byte of the data word corresponds to a non-zero byte value, the second value indicating that the byte of the data word corresponds to a zero byte value. The data word can be modified to have non-zero byte values stored at an end of a first side of the data word in the memory, and any zero byte values stored in a remainder of the data word. The modified data word can be written to the memory via at least a first slice of a plurality of slices that is configured to access the first side of the data word in the memory.

    Systems and methods for speech or text processing using matrix operations

    公开(公告)号:US11899745B1

    公开(公告)日:2024-02-13

    申请号:US16997401

    申请日:2020-08-19

    CPC classification number: G06F17/16 G06F7/5443

    Abstract: Disclosed herein includes a system, a method, and a device for processing and converting data using matrix operations. Circuitry can partition an input of a first data format across a plurality of lookup tables each residing in a respective memory. The circuitry can access weight information from a load store memory, and the partitioned input on a per column basis from the plurality of lookup tables. The circuitry can perform a number of multiply-accumulate (MAC) operations per cycle between the weight information from the load store memory and the partitioned input read on a per column basis from the plurality of lookup tables. The number of MAC operations performed per cycle can correspond to a total number of columns of the plurality of lookup tables. The circuitry can generate, responsive to the MAC operations on the partitioned input, a plurality of outputs in a second data format.

    SYSTEMS AND METHODS FOR SPEECH OR TEXT PROCESSING USING MATRIX OPERATIONS

    公开(公告)号:US20240152575A1

    公开(公告)日:2024-05-09

    申请号:US18414901

    申请日:2024-01-17

    CPC classification number: G06F17/16 G06F7/5443

    Abstract: Disclosed herein includes a system, a method, and a device for processing and converting data using matrix operations. Circuitry can partition an input of a first data format across a plurality of lookup tables each residing in a respective memory. The circuitry can access weight information from a load store memory, and the partitioned input on a per column basis from the plurality of lookup tables. The circuitry can perform a number of multiply-accumulate (MAC) operations per cycle between the weight information from the load store memory and the partitioned input read on a per column basis from the plurality of lookup tables. The number of MAC operations performed per cycle can correspond to a total number of columns of the plurality of lookup tables. The circuitry can generate, responsive to the MAC operations on the partitioned input, a plurality of outputs in a second data format.

    SYSTEMS AND METHODS FOR READING AND WRITING SPARSE DATA IN A NEURAL NETWORK ACCELERATOR

    公开(公告)号:US20230229591A1

    公开(公告)日:2023-07-20

    申请号:US18126228

    申请日:2023-03-24

    CPC classification number: G06F12/04 G06F2212/1028 G06N3/08

    Abstract: Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A mask identifying byte positions within a data word having non-zero values in memory can be accessed. Each bit of the mask can have a first value or a second value, the first value indicating that a byte of the data word corresponds to a non-zero byte value, the second value indicating that the byte of the data word corresponds to a zero byte value. The data word can be modified to have non-zero byte values stored at an end of a first side of the data word in the memory, and any zero byte values stored in a remainder of the data word. The modified data word can be written to the memory via at least a first slice of a plurality of slices that is configured to access the first side of the data word in the memory.

    Systems and methods for reading and writing sparse data in a neural network accelerator

    公开(公告)号:US11630770B2

    公开(公告)日:2023-04-18

    申请号:US16509138

    申请日:2019-07-11

    Abstract: Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A plurality of slices can be established to access a memory having an access size of a data word. A first slice can be configured to access a first side of the data word in memory. Circuitry can access a mask identifying byte positions within the data word having non-zero values. The circuitry can modify the data word to have non-zero byte values stored starting at an end of the first side, and any zero byte values stored in a remainder of the data word. A determination can be made whether a number of non-zero byte values is less than or equal to a first access size of the first slice. The circuitry can write the modified data word to the memory via at least the first slice.

    Efficient multiply-accumulation based on sparse matrix

    公开(公告)号:US11429394B2

    公开(公告)日:2022-08-30

    申请号:US16997460

    申请日:2020-08-19

    Abstract: Disclosed herein includes improving computational efficiency of multiply-accumulate (MAC) operation. In one aspect, a computing device identifies, a first vector including non-zero elements of a base matrix, and a second vector indicating a location of each of the non-zero elements of the base matrix. In one aspect, the device determines a first element and a second element of the first vector. In one aspect, the device determines a third element and a fourth element of the second vector. In one aspect, the device determines i) a fifth element of an input vector according to the third element of the second vector, and ii) a sixth element of the input vector according to the fourth element of the second vector. In one aspect, the device causes a MAC circuitry to perform a dot product according to the first element, the second element, the fifth element, and the sixth element.

Patent Agency Ranking