-
11.
公开(公告)号:US20210182196A1
公开(公告)日:2021-06-17
申请号:US16717998
申请日:2019-12-17
Applicant: Facebook, Inc.
Inventor: Olivia Wu , Abdulkadir Utku Diril , Krishnakumar Narayanan Nair , Aravind Kalaiah , Anup Ramesh Kadkol , Pankaj Kansal
IPC: G06F12/0813 , G06F13/16 , G06N3/02
Abstract: A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. Each request processing unit includes a plurality of decomposition units and a crossbar switch, the crossbar switch communicatively connecting each of the plurality of decomposition units to each of the plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access the plurality of memory units using a dynamically programmable distribution scheme.
-
12.
公开(公告)号:US20210124794A1
公开(公告)日:2021-04-29
申请号:US16667791
申请日:2019-10-29
Applicant: Facebook, Inc.
Inventor: Krishnakumar Narayanan Nair , Olivia Wu , Ehsan Khish Ardestani Zadeh , Abdulkadir Utku Diril , Thomas Mark Ulrich , Yuchen Hao , Rakesh Komuravelli , Aravind Kalaiah
Abstract: A system comprises a data input vector unit, a weight input vector unit, and a plurality of calculation units of a matrix processor unit. The data input vector unit is configured to concurrently receive elements of different rows of a first and second data matrix. The weight input vector unit is configured to receive a combined weight vector and at least in part concurrently provide obtained weight elements of a first and second weight matrix to a corresponding first and second group of calculation units. Each calculation unit of the first and second group of calculation units is configured to multiply elements from the data input vector unit with elements of the corresponding weight matrix from the weight input vector unit and sum together multiplication results of the corresponding calculation unit to at least in part determine a corresponding element in a first or second convolution result matrix.
-
公开(公告)号:US20210349965A1
公开(公告)日:2021-11-11
申请号:US16869303
申请日:2020-05-07
Applicant: Facebook, Inc.
Abstract: A device (e.g., an application-specific integrated circuit chip) includes a matrix transpose component, a matrix processing component, a data alignment component, and a data reduction component. The matrix transpose component is configured to transpose an input matrix of elements to output an output matrix of the elements that have been transposed, wherein: each element of the input matrix of elements is represented using a first number of bits, each value of a group of values stored in the input matrix is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the input matrix. The matrix processing component is configured to multiply a first multiplication input matrix with a second multiplication input matrix, wherein the output matrix of the matrix transpose component is utilized as the first multiplication input matrix and a mask vector is utilized as the second multiplication input matrix. The data alignment component is configured to modify at least a portion of elements of a result of the matrix processing component. The data reduction component is configured to sum at least the elements of the modified result of the matrix processing component to determine a sum of the group of values.
-
公开(公告)号:US20210349690A1
公开(公告)日:2021-11-11
申请号:US16869281
申请日:2020-05-07
Applicant: Facebook, Inc.
Abstract: A device (e.g., an integrated circuit chip) includes a dot product processing component, a data alignment component, and an accumulator. The dot product processing component is configured to calculate a dot product of a first group of elements stored in a first storage unit with a second group of elements, wherein: each element of the first group of elements is represented using a first number of bits, each value of a group of values stored in the first storage unit is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the first group of elements. The data alignment component is configured to receive results of the dot product processing component and modify one or more of the results of the dot product processing component. The accumulator is configured to sum outputs of the data alignment component to at least in part determine a sum of the group of values.
-
公开(公告)号:US20210334072A1
公开(公告)日:2021-10-28
申请号:US16855927
申请日:2020-04-22
Applicant: Facebook, Inc.
Inventor: Rakesh Komuravelli , Krishnakumar Narayanan Nair , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian
Abstract: A processor system comprises a plurality of dot product processor units and element-wise multiplication units. The dot product processor units perform a depthwise convolution of a data matrix with a separate depthwise convolution weight matrix for each data matrix channel. Each dot product processor unit performs at least a portion of the depthwise convolution for one or more data matrix channels. The element-wise multiplication units perform multiplication operations of a pointwise convolution. Each element-wise multiplication unit applies to each depthwise convolution partial result element received from one or more of the dot product processor units a corresponding data element from each of a plurality of pointwise convolution weight filters to determine element-wise multiplication unit results. The processor system sums together different groups of data elements from the element-wise multiplication unit results to at least in part calculate different data elements of a result of the pointwise convolution.
-
公开(公告)号:US20210319076A1
公开(公告)日:2021-10-14
申请号:US16843645
申请日:2020-04-08
Applicant: Facebook, Inc.
Inventor: Rakesh Komuravelli , Krishnakumar Narayanan Nair , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian
Abstract: A processor system comprises a plurality of processing elements. Each processing element includes a corresponding convolution processor unit configured to perform a portion of a groupwise convolution. The corresponding convolution processor unit determines multiplication results by multiplying each data element of a portion of data elements in a convolution data matrix with a corresponding data element in a corresponding groupwise convolution weight matrix. The portion of data elements in the convolution data matrix that are multiplied belong to different channels and different groups. For each specific channel of the different channels, the corresponding convolution processor unit sums together at least some of the multiplication results belonging to the same specific channel to determine a corresponding channel convolution result data element. The processing elements sum together a portion of the channel convolution result data elements from a group of different convolution processor units to determine a groupwise convolution result data element.
-
公开(公告)号:US20210294875A1
公开(公告)日:2021-09-23
申请号:US16826697
申请日:2020-03-23
Applicant: Facebook, Inc.
Inventor: Rakesh Komuravelli , Krishnakumar Narayanan Nair , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian
Abstract: A processor system comprises a hardware channel convolution processor unit and dot product processor unit. The channel convolution processor unit is configured to perform depthwise convolution, including by multiplying each data element of a first group of data elements of a convolution data matrix with a corresponding data element of a second group of data elements of a plurality of depthwise convolution weight matrices and summing together, for each specific channel, multiplication results corresponding to the specific channel to determine one corresponding result data element in a corresponding channel convolution result matrix to calculate a portion of depthwise convolution results. The dot product processor unit is configured to perform pointwise convolution, including applying pointwise weight matrices to the portion of depthwise convolution results to determine a portion of separable convolution results while at least another portion of the depthwise convolution results is being calculated by the processor system.
-
18.
公开(公告)号:US11120328B1
公开(公告)日:2021-09-14
申请号:US16354665
申请日:2019-03-15
Applicant: Facebook, Inc.
Inventor: Krishnakumar Narayanan Nair
Abstract: A computer-implemented method may include maintaining, within a local memory device (LMD) in a hardware accelerator (1) a filter matrix that may include a set of filter vectors corresponding to a filter location in each of a set of filters of a convolutional layer of an artificial neural network, and (2) an activation matrix that may include a primary and a secondary set of activation vectors, each activation vector included in an activation volume. The method may also include (1) directing a matrix multiplication unit (MMU) in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix, (2) replacing (i) the filter matrix with an additional filter matrix, and (ii) the secondary set of activation vectors with an additional set of activation vectors, and (3) directing the MMU to execute an additional MMO using the additional filter matrix and the activation matrix.
-
公开(公告)号:US20210173646A1
公开(公告)日:2021-06-10
申请号:US16708224
申请日:2019-12-09
Applicant: Facebook, Inc.
Inventor: Thomas Mark Ulrich , Krishnakumar Narayanan Nair , Yuchen Hao
Abstract: A processor system comprises a shared memory and a processing element. The processing element includes a matrix processor unit and is in communication with the shared memory. The processing element is configured to receive a processor instruction specifying a data matrix and a matrix manipulation operation. A manipulation matrix based on the processor instruction is identified. The data matrix and the manipulation matrix are loaded into the matrix processor unit and a matrix operation is performed to determine a result matrix. The result matrix is outputted to a destination location.
-
20.
公开(公告)号:US20210125044A1
公开(公告)日:2021-04-29
申请号:US16667700
申请日:2019-10-29
Applicant: Facebook, Inc.
Inventor: Yuchen Hao , Krishnakumar Narayanan Nair , Ehsan Khish Ardestani Zadeh , Rakesh Komuravelli , Abdulkadir Utku Diril , Thomas Mark Ulrich
Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix. A control unit is used to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices.
-
-
-
-
-
-
-
-
-