Lowering hardware for neural networks

    公开(公告)号:US11256977B2

    公开(公告)日:2022-02-22

    申请号:US15857909

    申请日:2017-12-29

    申请人: Facebook, Inc.

    摘要: A disclosed computing system may include a special-purpose hardware device having an input subsystem, a linearization subsystem, and a matrix multiplication unit. The input subsystem may facilitate on-the-fly convolution lowering within a neural network convolution layer by directing input volume patches to logical unit(s) of the device. The linearization subsystem may be configured to receive a patch from the input subsystem and to linearize the patch by arranging elements of the patch as a portion of a data matrix row. The matrix multiplication unit of device may be configured to receive the data matrix from the linearization subsystem and to apply a filter matrix to the data matrix via a matrix multiplication operation. Various other methods, systems, and computer-readable media are also disclosed.

    THREE-DIMENSIONAL CONVOLUTION PIPELINE WITH MEMORY ORGANIZER UNIT

    公开(公告)号:US20210049426A1

    公开(公告)日:2021-02-18

    申请号:US16543239

    申请日:2019-08-16

    申请人: Facebook, Inc.

    摘要: A processor system comprises a memory organizer unit and a matrix computing unit. The memory organizer unit is configured to receive a request for a three-dimensional data of a convolutional neural network layer. The requested three-dimensional data is obtained from a memory. The obtained three-dimensional data is rearranged in an optimized linear order and the rearranged data in the optimized linear order is provided to the matrix computing unit. The matrix computing unit is configured to perform at least a portion of a three-dimensional convolution using at least a portion of the provided rearranged data in the optimized linear order.

    Hardware accelerator pre-configured with coefficients for matrix-transform operations

    公开(公告)号:US10372787B2

    公开(公告)日:2019-08-06

    申请号:US15839229

    申请日:2017-12-12

    申请人: Facebook, Inc.

    摘要: A special-purpose hardware accelerator may include a cache configured to store an input matrix related to performing a convolution operation and a matrix-multiplication subsystem pre-configured with matrix-transform coefficients for performing matrix-transform operations. The matrix-multiplication subsystem may perform the convolution operation by (1) reading the input matrix from the cache, (2) transforming the input matrix via matrix multiplication, (3) transforming, via matrix multiplication, a parameter matrix that includes convolution parameters for performing the convolution operation, (4) applying the transformed parameter matrix to the transformed input matrix via an element-wise multiplication operation, and then (5) performing an inverse-transformation operation on the results of the element-wise multiplication operation to create an output matrix for the convolution operation. Various other systems and methods are also disclosed.

    HARDWARE FOR FLOATING-POINT ARITHMETIC IN MULTIPLE FORMATS

    公开(公告)号:US20210255830A1

    公开(公告)日:2021-08-19

    申请号:US16795097

    申请日:2020-02-19

    申请人: Facebook, Inc.

    IPC分类号: G06F7/487 G06F7/485

    摘要: A floating-point number in a first format representation is received. Based on an identification of a floating-point format type of the floating-point number, different components of the first format representation are identified. The different components of the first format representation are placed in corresponding components of a second format representation of the floating-point number, wherein a total number of bits of the second format representation is larger than a total number of bits of the first format representation. At least one of the components of the second format representation is padded with one or more zero bits. The floating-point number in the second format representation is stored in a register. A multiplication using the second format representation of the floating-point number is performed.