Accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits

    公开(公告)号:US11714998B2

    公开(公告)日:2023-08-01

    申请号:US16909295

    申请日:2020-06-23

    CPC classification number: G06N3/063 G06N3/0454 G06N3/088

    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

    ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

    公开(公告)号:US20200320375A1

    公开(公告)日:2020-10-08

    申请号:US16909295

    申请日:2020-06-23

    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

    ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

    公开(公告)号:US20240005135A1

    公开(公告)日:2024-01-04

    申请号:US18135958

    申请日:2023-04-18

    CPC classification number: G06N3/063 G06N3/045 G06N3/088

    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

    HARDWARE-SOFTWARE CO-DESIGNED MULTI-CAST FOR IN-MEMORY COMPUTING ARCHITECTURES

    公开(公告)号:US20220113974A1

    公开(公告)日:2022-04-14

    申请号:US17561029

    申请日:2021-12-23

    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.

Patent Agency Ranking