Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Jeffrey T. Huynh"

21.

发明申请
DILATED CONVOLUTION USING SYSTOLIC ARRAY 有权

公开(公告)号：US20220292163A1

公开(公告)日：2022-09-15

申请号：US17832039

申请日：2022-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant

IPC: G06F17/15 , G06F15/80 , H04L49/9047 , G06V10/75 , G06V30/413

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

22.

发明授权
Transpose operations using processing element array 有权

公开(公告)号：US11347480B2

公开(公告)日：2022-05-31

申请号：US17122136

申请日：2020-12-15

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F7/50 , G06F7/523 , G06F8/41 , G06F9/38 , G06F9/50 , G06N3/063

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

23.

发明授权
Compile-time scheduling 有权

公开(公告)号：US11003429B1

公开(公告)日：2021-05-11

申请号：US16266915

申请日：2019-02-04

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Jeffrey T. Huynh , Tobias Joseph Kastulus Edler von Koch , Drazen Borkovic , Taemin Kim

IPC: G06F8/41 , G06F16/901 , G06F15/80

Abstract: Scheduling of the operations of an integrated circuit device such as a hardware accelerator, including scheduling of movement of data into and out of the accelerator, can be performed by a compiler that produces program code for the accelerator. The compiler can produce a graph that represents operations to be performed by the accelerator. Using the graph, the compiler can determine estimated execution times for the operations represented by each node in the graph. The compiler can schedule operations by determining an estimated execution time for set of dependent operations that depend from an operation. The compiler can then select an operation that has a shortest estimated execution time from among a set of operations and which has a set of dependent operations that has a longest estimated execution time as compared to other sets of dependent operations.

24.

发明申请
TRANSPOSE OPERATIONS USING PROCESSING ELEMENT ARRAY 有权

公开(公告)号：US20210096823A1

公开(公告)日：2021-04-01

申请号：US17122136

申请日：2020-12-15

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F9/38 , G06F7/523 , G06F9/50 , G06F7/50 , G06F8/41 , G06N3/063

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

25.

发明授权
Registers for restricted memory 有权

公开(公告)号：US10678479B1

公开(公告)日：2020-06-09

申请号：US16204943

申请日：2018-11-29

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh

IPC: G06F12/00 , G06F3/06 , G06F13/28 , G06N3/02

Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank, and can read from and write to only the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

Patent Agency Ranking