-
公开(公告)号:US12210438B1
公开(公告)日:2025-01-28
申请号:US17947949
申请日:2022-09-19
Applicant: Amazon Technologies, Inc.
Inventor: Samuel Jacob , Drazen Borkovic , Yu Zhou , Mohammad El-Shabani
Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.
-
公开(公告)号:US10997277B1
公开(公告)日:2021-05-04
申请号:US16364837
申请日:2019-03-26
Applicant: Amazon Technologies, Inc.
Inventor: Yu Zhou , Vignesh Vivekraja , Ron Diamant
Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.
-
公开(公告)号:US10884707B1
公开(公告)日:2021-01-05
申请号:US16455201
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are systems and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US12073199B2
公开(公告)日:2024-08-27
申请号:US16433786
申请日:2019-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Randy Renfu Huang , Yu Zhou , Ron Diamant , Richard John Heaton
CPC classification number: G06F8/4441 , G06N3/04 , G06N3/10
Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.
-
公开(公告)号:US11467946B1
公开(公告)日:2022-10-11
申请号:US16368351
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Samuel Jacob , Drazen Borkovic , Yu Zhou , Mohammad El-Shabani
Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.
-
公开(公告)号:US11347480B2
公开(公告)日:2022-05-31
申请号:US17122136
申请日:2020-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US20210096823A1
公开(公告)日:2021-04-01
申请号:US17122136
申请日:2020-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US10929063B1
公开(公告)日:2021-02-23
申请号:US16368538
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton
Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.
-
-
-
-
-
-
-