Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

181.

发明公开
NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT 审中-公开

公开(公告)号：US20230196113A1

公开(公告)日：2023-06-22

申请号：US18112036

申请日：2023-02-21

Applicant: Amazon Technologies,Inc

Inventor： Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekaja

IPC: G06N3/04

CPC classification number: G06N3/084 , G06N3/04

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

182.

发明公开
DYNAMIC PROCESSING ELEMENT ARRAY EXPANSION 审中-公开

公开(公告)号：US20230153620A1

公开(公告)日：2023-05-18

申请号：US18154576

申请日：2023-01-13

Applicant: Amazon Technologies, Inc.

Inventor： Randy Renfu Huang , Ron Diamant , Richard John Heaton

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.

183.

发明授权
Scheduling for locality of reference to memory 有权

公开(公告)号：US11625269B1

公开(公告)日：2023-04-11

申请号：US17301343

申请日：2021-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Robert Geva , Taylor Goodhart , Ron Diamant , Preston Pengra Briggs

IPC: G06F9/48 , G06F8/41 , G06N3/063 , G06F7/24

Abstract: A technique for scheduling instructions includes obtaining a set of instructions that operate on memory objects, and determining the dependencies of the memory objects. The memory objects are then sorted into a sequence of memory objects based on the dependencies of the memory objects, and the set of instructions are scheduled into a sequence of instructions according to the sequence of memory objects. Sorting memory objects allows instructions that operate on the same memory object to be kept together. This helps minimize spilling conditions because intervening instructions that do not operate on the same memory object can be avoided.

184.

发明授权
Neural network training under memory restraint 有权

公开(公告)号：US11610128B2

公开(公告)日：2023-03-21

申请号：US16836421

申请日：2020-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja

IPC: G06N3/08 , G06N3/084 , G06N3/04

Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

185.

发明授权
Neural network operation reordering for parallel execution 有权

公开(公告)号：US11567778B2

公开(公告)日：2023-01-31

申请号：US17243415

申请日：2021-04-28

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Drazen Borkovic , Jindrich Zejda , Randy Renfu Huang , Ron Diamant

IPC: G06F9/44 , G06F9/38 , G06F9/50 , G06N3/04 , G06N3/08

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

186.

发明申请
PROCESSING FOR MULTIPLE INPUT DATA SETS 有权

公开(公告)号：US20230014783A1

公开(公告)日：2023-01-19

申请号：US17951084

申请日：2022-09-22

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang

IPC: G06N3/08 , G06F3/06 , G06N3/04

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

187.

发明申请
AUTO-DETECTION OF INTERCONNECT HANGS IN INTEGRATED CIRCUITS 有权

公开(公告)号：US20220413980A1

公开(公告)日：2022-12-29

申请号：US17896739

申请日：2022-08-26

Applicant: Amazon Technologies, Inc.

Inventor： Noga Smith , Ron Diamant , Saar Gross

IPC: G06F11/30 , G06F11/14 , G06F13/28

Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.

188.

发明授权
Arbitrating throttling recommendations for a systolic array 有权

公开(公告)号：US11520731B1

公开(公告)日：2022-12-06

申请号：US17091964

申请日：2020-11-06

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Thomas A Volpe

IPC: G06F15/80 , G06F9/30 , G06N20/00 , G06F15/78 , G06F15/17 , G06F13/38 , G06F13/366 , G06F1/3206 , G06F15/76

Abstract: Throttling recommendations for a systolic array may be arbitrated. Throttling recommendations may be received at an arbiter for a systolic array from different sources, such as one or more monitors implemented in an integrated circuit along with the systolic array or sources external to the integrated circuit with the systolic array. A strongest throttling recommendation may be selected. The rate at which data enters the systolic array may be modified according to the strongest throttling recommendation.

189.

发明授权
Data replication for accelerator 有权

公开(公告)号：US11500802B1

公开(公告)日：2022-11-15

申请号：US17301344

申请日：2021-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Patricio Kaplan , Henry Wang

IPC: G06F13/28 , G06F15/80 , G06F15/173

Abstract: A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.

190.

发明授权
Auto-detection of interconnect hangs in integrated circuits 有权

公开(公告)号：US11429503B1

公开(公告)日：2022-08-30

申请号：US16456902

申请日：2019-06-28

Applicant: Amazon Technologies, Inc.

Inventor： Noga Smith , Ron Diamant , Saar Gross

IPC: G06F11/14 , G06F11/30 , G06F13/28

Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100 ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification