-
公开(公告)号:US20230196113A1
公开(公告)日:2023-06-22
申请号:US18112036
申请日:2023-02-21
Applicant: Amazon Technologies,Inc
Inventor: Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekaja
IPC: G06N3/04
Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
-
公开(公告)号:US20230153620A1
公开(公告)日:2023-05-18
申请号:US18154576
申请日:2023-01-13
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.
-
公开(公告)号:US11625269B1
公开(公告)日:2023-04-11
申请号:US17301343
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Robert Geva , Taylor Goodhart , Ron Diamant , Preston Pengra Briggs
Abstract: A technique for scheduling instructions includes obtaining a set of instructions that operate on memory objects, and determining the dependencies of the memory objects. The memory objects are then sorted into a sequence of memory objects based on the dependencies of the memory objects, and the set of instructions are scheduled into a sequence of instructions according to the sequence of memory objects. Sorting memory objects allows instructions that operate on the same memory object to be kept together. This helps minimize spilling conditions because intervening instructions that do not operate on the same memory object can be avoided.
-
公开(公告)号:US11610128B2
公开(公告)日:2023-03-21
申请号:US16836421
申请日:2020-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
-
公开(公告)号:US11567778B2
公开(公告)日:2023-01-31
申请号:US17243415
申请日:2021-04-28
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Drazen Borkovic , Jindrich Zejda , Randy Renfu Huang , Ron Diamant
Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
-
公开(公告)号:US20230014783A1
公开(公告)日:2023-01-19
申请号:US17951084
申请日:2022-09-22
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang
Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.
-
公开(公告)号:US20220413980A1
公开(公告)日:2022-12-29
申请号:US17896739
申请日:2022-08-26
Applicant: Amazon Technologies, Inc.
Inventor: Noga Smith , Ron Diamant , Saar Gross
Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.
-
公开(公告)号:US11520731B1
公开(公告)日:2022-12-06
申请号:US17091964
申请日:2020-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Thomas A Volpe
IPC: G06F15/80 , G06F9/30 , G06N20/00 , G06F15/78 , G06F15/17 , G06F13/38 , G06F13/366 , G06F1/3206 , G06F15/76
Abstract: Throttling recommendations for a systolic array may be arbitrated. Throttling recommendations may be received at an arbiter for a systolic array from different sources, such as one or more monitors implemented in an integrated circuit along with the systolic array or sources external to the integrated circuit with the systolic array. A strongest throttling recommendation may be selected. The rate at which data enters the systolic array may be modified according to the strongest throttling recommendation.
-
公开(公告)号:US11500802B1
公开(公告)日:2022-11-15
申请号:US17301344
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Patricio Kaplan , Henry Wang
IPC: G06F13/28 , G06F15/80 , G06F15/173
Abstract: A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.
-
公开(公告)号:US11429503B1
公开(公告)日:2022-08-30
申请号:US16456902
申请日:2019-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Noga Smith , Ron Diamant , Saar Gross
Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100 ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.
-
-
-
-
-
-
-
-
-