Patent search ap:("Facebook Page Inc.") AND inv:"Rakesh Komuravelli"

1.

发明申请
FLOATING POINT MULTIPLY HARDWARE USING DECOMPOSED COMPONENT NUMBERS 有权

公开(公告)号：US20220107782A1

公开(公告)日：2022-04-07

申请号：US17506506

申请日：2021-10-20

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Anup Ramesh Kadkol , Ehsan Khish Ardestani Zadeh , Olivia Wu , Yuchen Hao , Thomas Mark Ulrich , Rakesh Komuravelli

IPC: G06F7/487 , G06N3/02 , G06F17/16 , G06F7/485

Abstract: A processor system comprises one or more logic units configured to receive a processor instruction identifying a first floating point number to be multiplied with a second floating point number. The floating point numbers are each decomposed into a group of a plurality of component numbers, wherein a number of bits used to represent each floating point number is greater than a number of bits used to represent any component number in each group of the plurality of component numbers. The component numbers of the first group are multiplied with the component numbers of the second group to determine intermediate multiplication results that are summed together to determine an effective result that represents a result of multiplying the first floating point number with the second floating point number.

2.

发明申请
MAPPING CONVOLUTION TO A PARTITION CHANNEL CONVOLUTION ENGINE 有权

公开(公告)号：US20210271451A1

公开(公告)日：2021-09-02

申请号：US16805339

申请日：2020-02-28

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Rakesh Komuravelli , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian

IPC: G06F7/544 , G06F17/15 , G06N20/00

Abstract: A processor system comprises two groups of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate matrix for each channel. Each register stores at least one data element from each matrix. The hardware channel convolution processor unit is configured to multiply each data element in a first and second portion of the first group of registers with a corresponding data element in the second group of registers to determine corresponding multiplication results and sum together the multiplication results for each specific channel to determine two corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

3.

发明申请
HARDWARE FOR FLOATING-POINT ARITHMETIC IN MULTIPLE FORMATS 有权

公开(公告)号：US20210255830A1

公开(公告)日：2021-08-19

申请号：US16795097

申请日：2020-02-19

Applicant: Facebook, Inc.

Inventor： Thomas Mark Ulrich , Abdulkadir Utku Diril , Krishnakumar Narayanan Nair , Zhao Wang , Rakesh Komuravelli

IPC: G06F7/487 , G06F7/485

Abstract: A floating-point number in a first format representation is received. Based on an identification of a floating-point format type of the floating-point number, different components of the first format representation are identified. The different components of the first format representation are placed in corresponding components of a second format representation of the floating-point number, wherein a total number of bits of the second format representation is larger than a total number of bits of the first format representation. At least one of the components of the second format representation is padded with one or more zero bits. The floating-point number in the second format representation is stored in a register. A multiplication using the second format representation of the floating-point number is performed.

4.

发明申请
SYSTEMS AND METHODS FOR REDUCING DATA MOVEMENT DURING CONVOLUTION OPERATIONS IN ARTIFICIAL NEURAL NETWORKS 有权

公开(公告)号：US20210192359A1

公开(公告)日：2021-06-24

申请号：US16722636

申请日：2019-12-20

Applicant: Facebook, Inc.

Inventor： Ehsan Khish Ardestani Zadeh , Martin Schatz , Krishnakumar Narayanan Nair , Yuchen Hao , Abdulkadir Utku Diril , Rakesh Komuravelli

IPC: G06N3/10 , G06F17/15 , G06N3/04

Abstract: The disclosed computer-implemented method may include (1) receiving, at a hardware accelerator that supports an ANN, an activation data set that is to undergo a convolution operation via a filter kernel of the ANN, (2) receiving, at the hardware accelerator, an argument indicating that the filter kernel exceeds at least one boundary of the activation data set when slid across a certain position during the convolution operation, (3) determining, based at least in part on the argument, that the hardware accelerator is to generate padding data at the boundary of the activation data set in connection with the certain position of the filter kernel, and then (4) performing, at the hardware accelerator, the convolution operation by processing a portion of the activation data set and the padding data when the filter kernel slides across the certain position. Various other systems and methods are also disclosed.

5.

发明授权
Floating point multiply hardware using decomposed component numbers 有权

公开(公告)号：US11188303B2

公开(公告)日：2021-11-30

申请号：US16591042

申请日：2019-10-02

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Anup Ramesh Kadkol , Ehsan Khish Ardestani Zadeh , Olivia Wu , Yuchen Hao , Thomas Mark Ulrich , Rakesh Komuravelli

IPC: G06F7/487 , G06F7/485 , G06F17/16 , G06N3/02

Abstract: A processor system comprises one or more logic units configured to receive a processor instruction identifying a first floating point number to be multiplied with a second floating point number. The floating point numbers are each decomposed into a group of a plurality of component numbers, wherein a number of bits used to represent each floating point number is greater than a number of bits used to represent any component number in each group of the plurality of component numbers. The component numbers of the first group are multiplied with the component numbers of the second group to determine intermediate multiplication results that are summed together to determine an effective result that represents a result of multiplying the first floating point number with the second floating point number.

6.

发明申请
HIGH THROUGHPUT MATRIX PROCESSOR WITH SUPPORT FOR CONCURRENTLY PROCESSING MULTIPLE MATRICES 有权

公开(公告)号：US20210124794A1

公开(公告)日：2021-04-29

申请号：US16667791

申请日：2019-10-29

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Olivia Wu , Ehsan Khish Ardestani Zadeh , Abdulkadir Utku Diril , Thomas Mark Ulrich , Yuchen Hao , Rakesh Komuravelli , Aravind Kalaiah

IPC: G06F17/16 , G06F7/544 , G06F17/15

Abstract: A system comprises a data input vector unit, a weight input vector unit, and a plurality of calculation units of a matrix processor unit. The data input vector unit is configured to concurrently receive elements of different rows of a first and second data matrix. The weight input vector unit is configured to receive a combined weight vector and at least in part concurrently provide obtained weight elements of a first and second weight matrix to a corresponding first and second group of calculation units. Each calculation unit of the first and second group of calculation units is configured to multiply elements from the data input vector unit with elements of the corresponding weight matrix from the weight input vector unit and sum together multiplication results of the corresponding calculation unit to at least in part determine a corresponding element in a first or second convolution result matrix.

7.

发明申请
MAPPING CONVOLUTION TO CONNECTED PROCESSING ELEMENTS USING DISTRIBUTED PIPELINED SEPARABLE CONVOLUTION OPERATIONS 有权

公开(公告)号：US20210334072A1

公开(公告)日：2021-10-28

申请号：US16855927

申请日：2020-04-22

Applicant: Facebook, Inc.

Inventor： Rakesh Komuravelli , Krishnakumar Narayanan Nair , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian

IPC: G06F7/544 , G06F7/50 , G06F7/523 , G06F17/16 , G06N3/08 , G06N20/00

Abstract: A processor system comprises a plurality of dot product processor units and element-wise multiplication units. The dot product processor units perform a depthwise convolution of a data matrix with a separate depthwise convolution weight matrix for each data matrix channel. Each dot product processor unit performs at least a portion of the depthwise convolution for one or more data matrix channels. The element-wise multiplication units perform multiplication operations of a pointwise convolution. Each element-wise multiplication unit applies to each depthwise convolution partial result element received from one or more of the dot product processor units a corresponding data element from each of a plurality of pointwise convolution weight filters to determine element-wise multiplication unit results. The processor system sums together different groups of data elements from the element-wise multiplication unit results to at least in part calculate different data elements of a result of the pointwise convolution.

8.

发明申请
GROUPED CONVOLUTION USING POINT-TO-POINT CONNECTED CHANNEL CONVOLUTION ENGINES 有权

公开(公告)号：US20210319076A1

公开(公告)日：2021-10-14

申请号：US16843645

申请日：2020-04-08

Applicant: Facebook, Inc.

Inventor： Rakesh Komuravelli , Krishnakumar Narayanan Nair , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian

IPC: G06F17/15 , G06F17/16 , G06F7/544 , G06N3/063

Abstract: A processor system comprises a plurality of processing elements. Each processing element includes a corresponding convolution processor unit configured to perform a portion of a groupwise convolution. The corresponding convolution processor unit determines multiplication results by multiplying each data element of a portion of data elements in a convolution data matrix with a corresponding data element in a corresponding groupwise convolution weight matrix. The portion of data elements in the convolution data matrix that are multiplied belong to different channels and different groups. For each specific channel of the different channels, the corresponding convolution processor unit sums together at least some of the multiplication results belonging to the same specific channel to determine a corresponding channel convolution result data element. The processing elements sum together a portion of the channel convolution result data elements from a group of different convolution processor units to determine a groupwise convolution result data element.

9.

发明申请
PIPELINED POINTWISE CONVOLUTION USING PER-CHANNEL CONVOLUTION OPERATIONS 有权

公开(公告)号：US20210294875A1

公开(公告)日：2021-09-23

申请号：US16826697

申请日：2020-03-23

Applicant: Facebook, Inc.

Inventor： Rakesh Komuravelli , Krishnakumar Narayanan Nair , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian

IPC: G06F17/15 , G06F17/16

Abstract: A processor system comprises a hardware channel convolution processor unit and dot product processor unit. The channel convolution processor unit is configured to perform depthwise convolution, including by multiplying each data element of a first group of data elements of a convolution data matrix with a corresponding data element of a second group of data elements of a plurality of depthwise convolution weight matrices and summing together, for each specific channel, multiplication results corresponding to the specific channel to determine one corresponding result data element in a corresponding channel convolution result matrix to calculate a portion of depthwise convolution results. The dot product processor unit is configured to perform pointwise convolution, including applying pointwise weight matrices to the portion of depthwise convolution results to determine a portion of separable convolution results while at least another portion of the depthwise convolution results is being calculated by the processor system.

10.

发明申请
SUPPORT FOR DIFFERENT MATRIX MULTIPLICATIONS BY SELECTING ADDER TREE INTERMEDIATE RESULTS 有权

公开(公告)号：US20210125044A1

公开(公告)日：2021-04-29

申请号：US16667700

申请日：2019-10-29

Applicant: Facebook, Inc.

Inventor： Yuchen Hao , Krishnakumar Narayanan Nair , Ehsan Khish Ardestani Zadeh , Rakesh Komuravelli , Abdulkadir Utku Diril , Thomas Mark Ulrich

IPC: G06N3/063 , G06F17/16

Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix. A control unit is used to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification