-
公开(公告)号:US12210940B1
公开(公告)日:2025-01-28
申请号:US17091853
申请日:2020-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Thomas A Volpe
Abstract: On-circuit activity monitoring may be performed to modify integrated circuit processing. An activity monitor may be implemented on an integrated circuit to monitor activity measurements of processing data at another portion of the integrated circuit. A change to activity measurements may be detected and cause the activity monitor to modify the rate at which data enters the other portion of the integrated circuit for processing.
-
公开(公告)号:US12182691B1
公开(公告)日:2024-12-31
申请号:US17249900
申请日:2021-03-17
Applicant: Amazon Technologies, Inc.
Inventor: Sundeep Amirineni , Akshay Balasubramanian , Joshua Wayne Bowman , Ron Diamant , Paul Gilbert Meyer , Thomas Elmer
Abstract: To improve performance of a computational array, the architecture of the array can be modified to allow the processing engines of a column to operate in parallel and the clock frequency of the array to be increased. The processing engines of each column of the array can be grouped into a series of row groups. The processing engines of each row group can be loaded with input values, and computations on the input values can be carried out in parallel to generate the column output. One or more flip-flop stages can be inserted into the computational logic of each of the processing engines. The computational logic can then be distributed across the flip-flop stages to reduce the propagation delay between flip-flop stages of the processing engine, hence allowing the clock frequency of the array to be increased.
-
公开(公告)号:US20240403646A1
公开(公告)日:2024-12-05
申请号:US18798323
申请日:2024-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Randy Renfu Renfu , Ron Diamant , Vignesh Vivekraja
Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
-
公开(公告)号:US12130885B1
公开(公告)日:2024-10-29
申请号:US18052527
申请日:2022-11-03
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.
-
公开(公告)号:US12045475B1
公开(公告)日:2024-07-23
申请号:US17457502
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Patricio Kaplan , Sundeep Amirineni , Laura Sharpless , Ron Diamant , Akshay Balasubramanian
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/064 , G06F3/0656 , G06F3/0659 , G06F3/0679 , G06F12/0246
Abstract: Techniques for implementing a dynamically resizable memory region for alternative use in a memory are described. The techniques may include using two concurrent address maps corresponding to two address ranges for a memory represented as an array of memory blocks. The first address range can be mapped to the memory with starting addresses of the memory blocks incrementing sequentially along each row. The second address range can be mapped to the memory with starting addresses of the memory blocks incrementing sequentially along each column. When an access request is received having a target address belonging to the first address range, the target address is provided as the memory address to access the memory. When an access request having a target address belonging to the second address range, the target address is translated by address translation logic into a memory address to access the memory.
-
公开(公告)号:US20240232630A1
公开(公告)日:2024-07-11
申请号:US18221454
申请日:2023-07-13
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.
-
公开(公告)号:US12008368B2
公开(公告)日:2024-06-11
申请号:US17934147
申请日:2022-09-21
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant
CPC classification number: G06F9/30036 , G06F9/30145 , G06F9/3555
Abstract: A technique to execute transpose and compute operations may include retrieving a set of machine instructions from an instruction buffer of a data processor. The instruction buffer has multiple entries, and each entry stores one machine instruction. A machine instruction from the set of machine instructions is executed to transpose a submatrix of an input tensor and perform computations on column elements of the submatrix. The machine instruction combines the transpose operation with computational operations into a single machine instruction.
-
公开(公告)号:US11941528B2
公开(公告)日:2024-03-26
申请号:US16588603
申请日:2019-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.
-
公开(公告)号:US11880682B2
公开(公告)日:2024-01-23
申请号:US17363894
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
CPC classification number: G06F9/3001 , G06F15/8046
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
-
公开(公告)号:US11880289B2
公开(公告)日:2024-01-23
申请号:US17896739
申请日:2022-08-26
Applicant: Amazon Technologies, Inc.
Inventor: Noga Smith , Ron Diamant , Saar Gross
CPC classification number: G06F11/3027 , G06F11/1441 , G06F13/28
Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100 ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.
-
-
-
-
-
-
-
-
-