Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Vignesh Vivekraja"

11.

发明申请
NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT 有权

公开(公告)号：US20240403646A1

公开(公告)日：2024-12-05

申请号：US18798323

申请日：2024-08-08

Applicant: Amazon Technologies, Inc.

Inventor： Sudipta Sengupta , Randy Renfu Renfu , Ron Diamant , Vignesh Vivekraja

IPC: G06N3/084 , G06N3/04

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

12.

发明授权
Emulating fine-grained sparsity in a systolic array 有权

公开(公告)号：US12130885B1

公开(公告)日：2024-10-29

申请号：US18052527

申请日：2022-11-03

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja

IPC: G06F17/16 , G06N3/04

CPC classification number: G06F17/16 , G06N3/04

Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.

13.

发明公开
NEURAL NETWORK TRAINING IN A DISTRIBUTED SYSTEM 审中-公开

公开(公告)号：US20240232630A1

公开(公告)日：2024-07-11

申请号：US18221454

申请日：2023-07-13

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Richard John Heaton

IPC: G06N3/084 , G06N3/045 , G06N3/063 , G06N3/10

CPC classification number: G06N3/084 , G06N3/045 , G06N3/063 , G06N3/10

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

14.

发明授权
Neural network training in a distributed system 有权

公开(公告)号：US11941528B2

公开(公告)日：2024-03-26

申请号：US16588603

申请日：2019-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Richard John Heaton

IPC: G06N3/084 , G06N3/063 , G06N3/045 , G06N3/10

CPC classification number: G06N3/084 , G06N3/045 , G06N3/063 , G06N3/10

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

15.

发明授权
Fine-grained sparsity computations in systolic array 有权

公开(公告)号：US11803736B1

公开(公告)日：2023-10-31

申请号：US16917015

申请日：2020-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja

IPC: G06N3/063 , G06F17/16 , G06F7/544 , G06F9/38

CPC classification number: G06N3/063 , G06F7/5443 , G06F9/3893 , G06F17/16 , G06F2207/4824

Abstract: A systolic array can implement an architecture tailored to perform matrix multiplications on constrained fine-grained sparse weight matrices. Each processing element in the systolic array may include a weight register configured to store a weight value, and a multiplexor configured to select a feature map (FMAP) input element from multiple FMAP input data buses based on metadata associated with the weight value. Each processing element may also include a multiplier configured to multiply the selected feature map input element with the weight value to generate a multiplication result, and an adder configured to add the multiplication result to a partial sum input to generate a partial sum output.

16.

发明授权
Assisted indirect memory addressing 有权

公开(公告)号：US10929063B1

公开(公告)日：2021-02-23

申请号：US16368538

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton

IPC: G06F3/06 , G06F12/06 , G06F13/28

Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.

17.

发明授权
Accelerated convolution of neural networks 有权

公开(公告)号：US12205013B1

公开(公告)日：2025-01-21

申请号：US17009483

申请日：2020-09-01

Applicant: Amazon Technologies, Inc.

Inventor： Thiam Khean Hah , Randy Renfu Huang , Richard John Heaton , Ron Diamant , Vignesh Vivekraja

IPC: G06N3/063 , G06N3/04

Abstract: Accelerated convolution of neural networks can be performed by executing N computing engines (CEs) of a neural network processor in parallel. An input dataset can be divided spatially into N chunks such that a respective last portion of each chunk overlaps with a respective first portion of a subsequent chunk. Portions of each chunk can be processed by a respective CE to generate a respective portion of an output dataset. The overlapping intermediate states computed by each CE from processing the overlapping portion can be stored locally for sharing with a subsequent CE using an on-chip bus.

18.

发明授权
Fine-grained sparsity computations in systolic array 有权

公开(公告)号：US12182695B1

公开(公告)日：2024-12-31

申请号：US18474129

申请日：2023-09-25

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja

IPC: G06F17/16 , G06F7/544 , G06F9/38 , G06N3/063

Abstract: A systolic array can implement an architecture tailored to perform matrix multiplications on sparse matrices. Each processing element in the systolic array may include a register configured to store a value, and a multiplexor configured to select an input element from multiple input data buses based on metadata associated with the value. Each processing element may also include a multiplier configured to multiply the selected input element with the value to generate a multiplication result, and an adder configured to add the multiplication result to a partial sum input to generate a partial sum output.

19.

发明授权
Reducing computation in neural networks using self-modifying code 有权

公开(公告)号：US12073199B2

公开(公告)日：2024-08-27

申请号：US16433786

申请日：2019-06-06

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Randy Renfu Huang , Yu Zhou , Ron Diamant , Richard John Heaton

IPC: G06N3/10 , G06F8/41 , G06N3/04

CPC classification number: G06F8/4441 , G06N3/04 , G06N3/10

Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.

20.

发明公开
TRANSPOSED CONVOLUTION USING SYSTOLIC ARRAY 审中-公开

公开(公告)号：US20230306249A1

公开(公告)日：2023-09-28

申请号：US18134726

申请日：2023-04-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T Huynh , Vignesh Vivekraja

IPC: G06N3/063 , G06F7/50 , G06F7/523 , G06F7/78 , G06F9/50 , G06F17/15 , G06F7/544

CPC classification number: G06N3/063 , G06F7/50 , G06F7/523 , G06F7/5443 , G06F7/78 , G06F9/5027 , G06F17/153

Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification