Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Yu Zhou"

1.

发明授权
Breakpoints in neural network accelerator 有权

公开(公告)号：US12210438B1

公开(公告)日：2025-01-28

申请号：US17947949

申请日：2022-09-19

Applicant: Amazon Technologies, Inc.

Inventor： Samuel Jacob , Drazen Borkovic , Yu Zhou , Mohammad El-Shabani

IPC: G06F11/36 , G06F8/41 , G06N3/10

Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

2.

发明授权
Multinomial distribution on an integrated circuit 有权

公开(公告)号：US10997277B1

公开(公告)日：2021-05-04

申请号：US16364837

申请日：2019-03-26

Applicant: Amazon Technologies, Inc.

Inventor： Yu Zhou , Vignesh Vivekraja , Ron Diamant

IPC: G06F17/18 , G06N3/04

Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.

3.

发明授权
Transpose operations using processing element array 有权

公开(公告)号：US10884707B1

公开(公告)日：2021-01-05

申请号：US16455201

申请日：2019-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F7/523 , G06F7/50 , G06N3/063 , G06F9/38 , G06F9/50 , G06F8/41

Abstract: Provided are systems and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

4.

发明授权
Reducing computation in neural networks using self-modifying code 有权

公开(公告)号：US12073199B2

公开(公告)日：2024-08-27

申请号：US16433786

申请日：2019-06-06

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Randy Renfu Huang , Yu Zhou , Ron Diamant , Richard John Heaton

IPC: G06N3/10 , G06F8/41 , G06N3/04

CPC classification number: G06F8/4441 , G06N3/04 , G06N3/10

Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.

5.

发明授权
Breakpoints in neural network accelerator 有权

公开(公告)号：US11467946B1

公开(公告)日：2022-10-11

申请号：US16368351

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Samuel Jacob , Drazen Borkovic , Yu Zhou , Mohammad El-Shabani

IPC: G06F11/36 , G06F8/41 , G06N3/10

Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

6.

发明授权
Transpose operations using processing element array 有权

公开(公告)号：US11347480B2

公开(公告)日：2022-05-31

申请号：US17122136

申请日：2020-12-15

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F7/50 , G06F7/523 , G06F8/41 , G06F9/38 , G06F9/50 , G06N3/063

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

7.

发明申请
TRANSPOSE OPERATIONS USING PROCESSING ELEMENT ARRAY 有权

公开(公告)号：US20210096823A1

公开(公告)日：2021-04-01

申请号：US17122136

申请日：2020-12-15

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F9/38 , G06F7/523 , G06F9/50 , G06F7/50 , G06F8/41 , G06N3/063

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

8.

发明授权
Assisted indirect memory addressing 有权

公开(公告)号：US10929063B1

公开(公告)日：2021-02-23

申请号：US16368538

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton

IPC: G06F3/06 , G06F12/06 , G06F13/28

Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification