-
公开(公告)号:US11256977B2
公开(公告)日:2022-02-22
申请号:US15857909
申请日:2017-12-29
申请人: Facebook, Inc.
摘要: A disclosed computing system may include a special-purpose hardware device having an input subsystem, a linearization subsystem, and a matrix multiplication unit. The input subsystem may facilitate on-the-fly convolution lowering within a neural network convolution layer by directing input volume patches to logical unit(s) of the device. The linearization subsystem may be configured to receive a patch from the input subsystem and to linearize the patch by arranging elements of the patch as a portion of a data matrix row. The matrix multiplication unit of device may be configured to receive the data matrix from the linearization subsystem and to apply a filter matrix to the data matrix via a matrix multiplication operation. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20210049426A1
公开(公告)日:2021-02-18
申请号:US16543239
申请日:2019-08-16
申请人: Facebook, Inc.
摘要: A processor system comprises a memory organizer unit and a matrix computing unit. The memory organizer unit is configured to receive a request for a three-dimensional data of a convolutional neural network layer. The requested three-dimensional data is obtained from a memory. The obtained three-dimensional data is rearranged in an optimized linear order and the rearranged data in the optimized linear order is provided to the matrix computing unit. The matrix computing unit is configured to perform at least a portion of a three-dimensional convolution using at least a portion of the provided rearranged data in the optimized linear order.
-
公开(公告)号:US10372787B2
公开(公告)日:2019-08-06
申请号:US15839229
申请日:2017-12-12
申请人: Facebook, Inc.
IPC分类号: G06F17/16 , G06F7/523 , G06F12/0875 , G06N3/02 , G06N20/00
摘要: A special-purpose hardware accelerator may include a cache configured to store an input matrix related to performing a convolution operation and a matrix-multiplication subsystem pre-configured with matrix-transform coefficients for performing matrix-transform operations. The matrix-multiplication subsystem may perform the convolution operation by (1) reading the input matrix from the cache, (2) transforming the input matrix via matrix multiplication, (3) transforming, via matrix multiplication, a parameter matrix that includes convolution parameters for performing the convolution operation, (4) applying the transformed parameter matrix to the transformed input matrix via an element-wise multiplication operation, and then (5) performing an inverse-transformation operation on the results of the element-wise multiplication operation to create an output matrix for the convolution operation. Various other systems and methods are also disclosed.
-
公开(公告)号:US20190205094A1
公开(公告)日:2019-07-04
申请号:US15857998
申请日:2017-12-29
申请人: Facebook, Inc.
IPC分类号: G06F7/523
CPC分类号: G06F7/523 , G06F7/5443 , G06F2207/382 , G06N3/0481 , G06N3/063 , G06N3/08 , G06N5/022
摘要: The disclosed method may include (1) receiving a precision level of each weight associated with each input of a node of a computational model, (2) identifying, for each weight, one of a plurality of multiplier groups, where each multiplier group may include a plurality of hardware multipliers of a corresponding bit width, and where the corresponding bit width of the plurality of hardware multipliers of the one of the plurality of multiplier groups may be sufficient to multiply the weight by the associated input, and (3) multiplying each weight by its associated input using an available hardware multiplier of the one of the plurality of multiplier groups identified for the weight. Various other processing elements, methods, and systems are also disclosed.
-
公开(公告)号:US20210326051A1
公开(公告)日:2021-10-21
申请号:US17307828
申请日:2021-05-04
申请人: Facebook, Inc.
发明人: Abdulkadir Utku Diril , Olivia Wu , Krishnakumar Narayanan Nair , Aravind Kalaiah , Anup Ramesh Kadkol , Pankaj Kansal
摘要: A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.
-
公开(公告)号:US20210271451A1
公开(公告)日:2021-09-02
申请号:US16805339
申请日:2020-02-28
申请人: Facebook, Inc.
发明人: Krishnakumar Narayanan Nair , Rakesh Komuravelli , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian
摘要: A processor system comprises two groups of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate matrix for each channel. Each register stores at least one data element from each matrix. The hardware channel convolution processor unit is configured to multiply each data element in a first and second portion of the first group of registers with a corresponding data element in the second group of registers to determine corresponding multiplication results and sum together the multiplication results for each specific channel to determine two corresponding channel convolution result data elements in a corresponding channel convolution result matrix.
-
公开(公告)号:US20210255830A1
公开(公告)日:2021-08-19
申请号:US16795097
申请日:2020-02-19
申请人: Facebook, Inc.
发明人: Thomas Mark Ulrich , Abdulkadir Utku Diril , Krishnakumar Narayanan Nair , Zhao Wang , Rakesh Komuravelli
摘要: A floating-point number in a first format representation is received. Based on an identification of a floating-point format type of the floating-point number, different components of the first format representation are identified. The different components of the first format representation are placed in corresponding components of a second format representation of the floating-point number, wherein a total number of bits of the second format representation is larger than a total number of bits of the first format representation. At least one of the components of the second format representation is padded with one or more zero bits. The floating-point number in the second format representation is stored in a register. A multiplication using the second format representation of the floating-point number is performed.
-
公开(公告)号:US11054998B1
公开(公告)日:2021-07-06
申请号:US16712253
申请日:2019-12-12
申请人: Facebook, Inc.
发明人: Abdulkadir Utku Diril , Olivia Wu , Krishnakumar Narayanan Nair , Aravind Kalaiah , Anup Ramesh Kadkol , Pankaj Kansal
摘要: A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.
-
公开(公告)号:US20210192359A1
公开(公告)日:2021-06-24
申请号:US16722636
申请日:2019-12-20
申请人: Facebook, Inc.
发明人: Ehsan Khish Ardestani Zadeh , Martin Schatz , Krishnakumar Narayanan Nair , Yuchen Hao , Abdulkadir Utku Diril , Rakesh Komuravelli
摘要: The disclosed computer-implemented method may include (1) receiving, at a hardware accelerator that supports an ANN, an activation data set that is to undergo a convolution operation via a filter kernel of the ANN, (2) receiving, at the hardware accelerator, an argument indicating that the filter kernel exceeds at least one boundary of the activation data set when slid across a certain position during the convolution operation, (3) determining, based at least in part on the argument, that the hardware accelerator is to generate padding data at the boundary of the activation data set in connection with the certain position of the filter kernel, and then (4) performing, at the hardware accelerator, the convolution operation by processing a portion of the activation data set and the padding data when the filter kernel slides across the certain position. Various other systems and methods are also disclosed.
-
公开(公告)号:US20210165691A1
公开(公告)日:2021-06-03
申请号:US16701019
申请日:2019-12-02
申请人: Facebook, Inc.
发明人: Abdulkadir Utku Diril , Olivia Wu , Krishnakumar Narayanan Nair , Anup Ramesh Kadkol , Aravind Kalaiah , Pankaj Kansal
摘要: A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme.
-
-
-
-
-
-
-
-
-