-
公开(公告)号:US11467992B1
公开(公告)日:2022-10-11
申请号:US17031668
申请日:2020-09-24
Applicant: Amazon Technologies, inc.
Inventor: Patricio Kaplan , Ron Diamant
Abstract: In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.
-
公开(公告)号:US20220292163A1
公开(公告)日:2022-09-15
申请号:US17832039
申请日:2022-06-03
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant
IPC: G06F17/15 , G06F15/80 , H04L49/9047 , G06V10/75 , G06V30/413
Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.
-
公开(公告)号:US11347480B2
公开(公告)日:2022-05-31
申请号:US17122136
申请日:2020-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US11138106B1
公开(公告)日:2021-10-05
申请号:US16836780
申请日:2020-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang
Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, the integrated circuit device can include a target port operable to receive transactions from a master port. The target port can be configured with a multicast address range that is associated with a plurality of indices corresponding to memory banks of the device. When the target port receives a write transaction that has an address that is within the multicast address range, the target port can determine an index from the plurality of indices, and can use the index to determine a second address, which combines the index and the offset value with the address. The target port can then use the second address to write the data to the memory.
-
公开(公告)号:US20210295168A1
公开(公告)日:2021-09-23
申请号:US16827444
申请日:2020-03-23
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant
Abstract: Techniques for exchanging compressed gradient data within a distributed system are disclosed. A set of gradients are computed at a first worker node of the distributed system using a neural network model and a set of weights associated with the neural network model. Each of the set of gradients having a value less than a threshold is clipped, resulting in non-clipped data elements and clipped data elements. A mapping indicating which of the set of gradients correspond to non-clipped data elements and which of the set of gradients correspond to clipped data elements is generated. Compressed data is generated based on the non-clipped data elements. The mapping and the compressed data are transmitted from the first worker node to a second worker node of the distributed system
-
公开(公告)号:US20210096823A1
公开(公告)日:2021-04-01
申请号:US17122136
申请日:2020-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US10956584B1
公开(公告)日:2021-03-23
申请号:US16141770
申请日:2018-09-25
Applicant: Amazon Technologies, inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant , David James Borland
Abstract: Systems and methods for performing neural network processing are provided. In one example, a system comprises a neural network processor comprising: a data decryption engine that receives encrypted data and decrypts the encrypted data, the encrypted data comprising at least one of: encrypted weights data, encrypted input data, or encrypted instruction data related to a neural network model; and a computing engine that receives the weights data and perform computations of neural network processing using the input data and the weights data and based on the instruction data.
-
公开(公告)号:US10929063B1
公开(公告)日:2021-02-23
申请号:US16368538
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton
Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.
-
公开(公告)号:US10901492B1
公开(公告)日:2021-01-26
申请号:US16369696
申请日:2019-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Nafea Bshara , Ron Diamant , Randy Renfu Huang , Ali Ghassan Saidi
Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.
-
公开(公告)号:US10896001B1
公开(公告)日:2021-01-19
申请号:US16145050
申请日:2018-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A. Volpe , Nafea Bshara , Raymond Scott Whiteside , Ron Diamant
Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, an integrated circuit device can be operable to determine, at a point in time during operation of the integrated circuit device, to generate a notification. The notification can include a type and a timestamp indicating the point in time. The notification can also include information about an internal status of the integrated circuit at the point in time. The device can further selectin a queue from a plurality of queues in a processor memory of the computing system that includes the integrated circuit. The device can further generate a write transaction including the notification, where the write transaction is addressed to the queue. The device can further output the write transaction using a communication interface of the device.
-
-
-
-
-
-
-
-
-