DEVICE AND METHOD FOR FLEXIBLY SUMMING MATRIX VALUES

    公开(公告)号:US20210349965A1

    公开(公告)日:2021-11-11

    申请号:US16869303

    申请日:2020-05-07

    Applicant: Facebook, Inc.

    Abstract: A device (e.g., an application-specific integrated circuit chip) includes a matrix transpose component, a matrix processing component, a data alignment component, and a data reduction component. The matrix transpose component is configured to transpose an input matrix of elements to output an output matrix of the elements that have been transposed, wherein: each element of the input matrix of elements is represented using a first number of bits, each value of a group of values stored in the input matrix is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the input matrix. The matrix processing component is configured to multiply a first multiplication input matrix with a second multiplication input matrix, wherein the output matrix of the matrix transpose component is utilized as the first multiplication input matrix and a mask vector is utilized as the second multiplication input matrix. The data alignment component is configured to modify at least a portion of elements of a result of the matrix processing component. The data reduction component is configured to sum at least the elements of the modified result of the matrix processing component to determine a sum of the group of values.

    USING A LOW-BIT-WIDTH DOT PRODUCT ENGINE TO SUM HIGH-BIT-WIDTH NUMBERS

    公开(公告)号:US20210349690A1

    公开(公告)日:2021-11-11

    申请号:US16869281

    申请日:2020-05-07

    Applicant: Facebook, Inc.

    Abstract: A device (e.g., an integrated circuit chip) includes a dot product processing component, a data alignment component, and an accumulator. The dot product processing component is configured to calculate a dot product of a first group of elements stored in a first storage unit with a second group of elements, wherein: each element of the first group of elements is represented using a first number of bits, each value of a group of values stored in the first storage unit is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the first group of elements. The data alignment component is configured to receive results of the dot product processing component and modify one or more of the results of the dot product processing component. The accumulator is configured to sum outputs of the data alignment component to at least in part determine a sum of the group of values.

    Systems and methods for reducing power consumption of convolution operations for artificial neural networks

    公开(公告)号:US11120328B1

    公开(公告)日:2021-09-14

    申请号:US16354665

    申请日:2019-03-15

    Applicant: Facebook, Inc.

    Abstract: A computer-implemented method may include maintaining, within a local memory device (LMD) in a hardware accelerator (1) a filter matrix that may include a set of filter vectors corresponding to a filter location in each of a set of filters of a convolutional layer of an artificial neural network, and (2) an activation matrix that may include a primary and a secondary set of activation vectors, each activation vector included in an activation volume. The method may also include (1) directing a matrix multiplication unit (MMU) in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix, (2) replacing (i) the filter matrix with an additional filter matrix, and (ii) the secondary set of activation vectors with an additional set of activation vectors, and (3) directing the MMU to execute an additional MMO using the additional filter matrix and the activation matrix.

    HARDWARE ACCELERATED MATRIX MANIPULATION OPERATIONS USING PROCESSOR INSTRUCTIONS

    公开(公告)号:US20210173646A1

    公开(公告)日:2021-06-10

    申请号:US16708224

    申请日:2019-12-09

    Applicant: Facebook, Inc.

    Abstract: A processor system comprises a shared memory and a processing element. The processing element includes a matrix processor unit and is in communication with the shared memory. The processing element is configured to receive a processor instruction specifying a data matrix and a matrix manipulation operation. A manipulation matrix based on the processor instruction is identified. The data matrix and the manipulation matrix are loaded into the matrix processor unit and a matrix operation is performed to determine a result matrix. The result matrix is outputted to a destination location.

    SUPPORT FOR DIFFERENT MATRIX MULTIPLICATIONS BY SELECTING ADDER TREE INTERMEDIATE RESULTS

    公开(公告)号:US20210125044A1

    公开(公告)日:2021-04-29

    申请号:US16667700

    申请日:2019-10-29

    Applicant: Facebook, Inc.

    Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix. A control unit is used to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices.

Patent Agency Ranking