Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

151.

发明申请
NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT 有权

公开(公告)号：US20210304010A1

公开(公告)日：2021-09-30

申请号：US16836421

申请日：2020-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja

IPC: G06N3/08 , G06N3/04

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

152.

发明申请
NEURAL NETWORK OPERATION REORDERING FOR PARALLEL EXECUTION 有权

公开(公告)号：US20210247984A1

公开(公告)日：2021-08-12

申请号：US17243415

申请日：2021-04-28

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Drazen Borkovic , Jindrich Zejda , Randy Renfu Huang , Ron Diamant

IPC: G06F9/38 , G06F9/50 , G06N3/08 , G06N3/04

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

153.

发明授权
Multinomial distribution on an integrated circuit 有权

公开(公告)号：US10997277B1

公开(公告)日：2021-05-04

申请号：US16364837

申请日：2019-03-26

Applicant: Amazon Technologies, Inc.

Inventor： Yu Zhou , Vignesh Vivekraja , Ron Diamant

IPC: G06F17/18 , G06N3/04

Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.

154.

发明申请
NEURAL NETWORK TRAINING IN A DISTRIBUTED SYSTEM 有权

公开(公告)号：US20210097396A1

公开(公告)日：2021-04-01

申请号：US16588603

申请日：2019-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Richard John Heaton

IPC: G06N3/08 , G06N3/10

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

155.

发明授权
Transpose operations using processing element array 有权

公开(公告)号：US10884707B1

公开(公告)日：2021-01-05

申请号：US16455201

申请日：2019-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F7/523 , G06F7/50 , G06N3/063 , G06F9/38 , G06F9/50 , G06F8/41

Abstract: Provided are systems and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

156.

发明授权
Powering-down or rebooting a device in a system fabric 有权

公开(公告)号：US10761939B1

公开(公告)日：2020-09-01

申请号：US16219489

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Thomas A. Volpe , Ron Diamant , Mark Anthony Banse

IPC: G06F11/14 , G06F13/40 , G06F11/07

Abstract: A circuit at an interface between a device and an interconnect fabric is configured to track outstanding transactions associated with the device and ensure the completion of the outstanding transactions before rebooting or powering down the device. In some embodiments, the circuit is also configurable to provide appropriate responses when the device is powered down or is being rebooted such that other devices in the system can still operate even without knowing that the device is inactive and would not hang because no response is received from the device.

157.

发明授权
Synchronization of computation engines with non-blocking instructions 有权

公开(公告)号：US10761822B1

公开(公告)日：2020-09-01

申请号：US16217858

申请日：2018-12-12

Applicant: Amazon Technologies, Inc.

Inventor： Drazen Borkovic , Jindrich Zejda , Taemin Kim , Ron Diamant

IPC: G06F11/36 , G06F17/10 , G06F3/048 , G06F13/40 , G06F17/50 , G06F9/30 , G06F12/00 , G06F8/41 , G06N3/02 , G06F12/1081 , G06F12/06 , G06F12/0888 , G06F8/34 , G06F9/50 , G06F9/455

Abstract: Provided are systems and methods for generating program code for an integrated circuit, where instructions in the code synchronize computation engines that support non-blocking instructions. In various examples, a computing device can receiving an input data set including operations to be performed by an integrated circuit device and dependencies between the operations. The input data set can include a non-blocking instruction, and an operation that requires that the non-blocking instruction be completed. The computing device can generate instructions for performing the operation including a particular instruction to wait for a value to be set in a register of the integrated circuit device. The computing device can further generate program code including the non-blocking instruction and the instructions for performing the operation, wherein the non-blocking instruction is configured to set the value in the register.

158.

发明授权
Network congestion detection and resolution 有权

公开(公告)号：US10742555B1

公开(公告)日：2020-08-11

申请号：US15838245

申请日：2017-12-11

Applicant: Amazon Technologies, Inc.

Inventor： Leah Shalev , Ron Diamant , Erez Izenberg , Nafea Bshara

IPC: H04L12/803 , H04L12/26 , H04L29/06 , H04L12/721 , H04L12/743 , H04L12/707 , H04J3/06

Abstract: A method and corresponding apparatus for detecting network congestion. The method includes capturing, using a local clock of a sender device, a send time of an outgoing packet sent from the sender device to a receiver device through a forward route, and capturing, using the local clock of the sender device, a receive time of an acknowledgment packet sent from the receiver device to the sender device through a backward route. The acknowledgment packet contains timing information, generated using a local clock of the receiver device, for determining an internal latency of the receiver device. A round trip time is computed as a difference between the send time and the receive time. The internal latency is subtracted from the round trip time to compute a total propagation time. If the total propagation time is above a threshold, the forward route and the backward route are changed.

159.

发明授权
Runtime augmentation of engine instructions 有权

公开(公告)号：US10664282B1

公开(公告)日：2020-05-26

申请号：US16266731

申请日：2019-02-04

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Mohammad El-Shabani , Dana Michelle Vantrease

IPC: G06F8/60 , G06F9/38 , G06F9/30 , G06F9/52 , G06F13/28 , G06F9/50

Abstract: Methods for repeated execution of program code by an execution engine are provided. In order to execute large programs, the instruction buffer of an execution engine may be refilled may times with program code to complete one execution of the program. At completion of program execution, the program code needed to begin re-execution of the program is no longer in the instruction buffer. A runtime driver program can load instructions into the instruction buffer, or can cause instructions to be loaded. Once the instructions are loaded, the execution engine may be able to re-execute the instructions without needing further assistance from the runtime driver.

160.

发明授权
Self-refill for instruction buffer 有权

公开(公告)号：US10592250B1

公开(公告)日：2020-03-17

申请号：US16014646

申请日：2018-06-21

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Ilya Minkin

IPC: G06F9/38 , G06F12/1081 , G06N5/04 , G06F5/06 , G06N3/04 , G06F13/28

Abstract: Disclosed herein are techniques for self-refilling an instruction buffer by an execution engine while the execution engine executes instructions in the instruction buffer. An instruction loader splits instruction code into sections of code and creates a data store (e.g., a DMA ring) for loading the sections of code into the instruction buffer. In some embodiments, an instruction is added to some sections of code. The instruction, when executed by the execution engine, triggers the loading of one or more sections of code into the instruction buffer based on one or more entries in the data store. In some embodiments, a hardware logic in the execution engine is configured to trigger the loading of the sections of code into the instruction buffer. In some embodiments, the one or more sections of code are loaded into the instruction buffer through a refill page that is different from the instruction buffer.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification