Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

101.

发明授权
Memory access operation in distributed computing system 有权

公开(公告)号：US11467992B1

公开(公告)日：2022-10-11

申请号：US17031668

申请日：2020-09-24

Applicant: Amazon Technologies, inc.

Inventor： Patricio Kaplan , Ron Diamant

IPC: G06F3/06 , G06F13/28

Abstract: In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.

102.

发明申请
DILATED CONVOLUTION USING SYSTOLIC ARRAY 有权

公开(公告)号：US20220292163A1

公开(公告)日：2022-09-15

申请号：US17832039

申请日：2022-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant

IPC: G06F17/15 , G06F15/80 , H04L49/9047 , G06V10/75 , G06V30/413

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

103.

发明授权
Transpose operations using processing element array 有权

公开(公告)号：US11347480B2

公开(公告)日：2022-05-31

申请号：US17122136

申请日：2020-12-15

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F7/50 , G06F7/523 , G06F8/41 , G06F9/38 , G06F9/50 , G06N3/063

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

104.

发明授权
Target port with distributed transactions 有权

公开(公告)号：US11138106B1

公开(公告)日：2021-10-05

申请号：US16836780

申请日：2020-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang

IPC: G06F12/02 , G06F13/40 , G06F13/42

Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, the integrated circuit device can include a target port operable to receive transactions from a master port. The target port can be configured with a multicast address range that is associated with a plurality of indices corresponding to memory banks of the device. When the target port receives a write transaction that has an address that is within the multicast address range, the target port can determine an index from the plurality of indices, and can use the index to determine a second address, which combines the index and the offset value with the address. The target port can then use the second address to write the data to the memory.

105.

发明申请
GRADIENT COMPRESSION FOR DISTRIBUTED TRAINING 有权

公开(公告)号：US20210295168A1

公开(公告)日：2021-09-23

申请号：US16827444

申请日：2020-03-23

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant

IPC: G06N3/08 , G06N5/04 , G06N20/00 , G06F17/18

Abstract: Techniques for exchanging compressed gradient data within a distributed system are disclosed. A set of gradients are computed at a first worker node of the distributed system using a neural network model and a set of weights associated with the neural network model. Each of the set of gradients having a value less than a threshold is clipped, resulting in non-clipped data elements and clipped data elements. A mapping indicating which of the set of gradients correspond to non-clipped data elements and which of the set of gradients correspond to clipped data elements is generated. Compressed data is generated based on the non-clipped data elements. The mapping and the compressed data are transmitted from the first worker node to a second worker node of the distributed system

106.

发明申请
TRANSPOSE OPERATIONS USING PROCESSING ELEMENT ARRAY 有权

公开(公告)号：US20210096823A1

公开(公告)日：2021-04-01

申请号：US17122136

申请日：2020-12-15

Applicant: Amazon Technologies, Inc.

Inventor： Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh

IPC: G06F7/78 , G06F9/38 , G06F7/523 , G06F9/50 , G06F7/50 , G06F8/41 , G06N3/063

Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

107.

发明授权
Secure data processing 有权

公开(公告)号：US10956584B1

公开(公告)日：2021-03-23

申请号：US16141770

申请日：2018-09-25

Applicant: Amazon Technologies, inc.

Inventor： Richard John Heaton , Randy Renfu Huang , Ron Diamant , David James Borland

IPC: G06F21/00 , G06F21/60 , G06N3/08 , G06F9/38

Abstract: Systems and methods for performing neural network processing are provided. In one example, a system comprises a neural network processor comprising: a data decryption engine that receives encrypted data and decrypts the encrypted data, the encrypted data comprising at least one of: encrypted weights data, encrypted input data, or encrypted instruction data related to a neural network model; and a computing engine that receives the weights data and perform computations of neural network processing using the input data and the weights data and based on the instruction data.

108.

发明授权
Assisted indirect memory addressing 有权

公开(公告)号：US10929063B1

公开(公告)日：2021-02-23

申请号：US16368538

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton

IPC: G06F3/06 , G06F12/06 , G06F13/28

Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.

109.

发明授权
Power reduction in processor pipeline by detecting zeros 有权

公开(公告)号：US10901492B1

公开(公告)日：2021-01-26

申请号：US16369696

申请日：2019-03-29

Applicant: Amazon Technologies, Inc.

Inventor： Nafea Bshara , Ron Diamant , Randy Renfu Huang , Ali Ghassan Saidi

IPC: G06F1/26 , G06F1/32 , G06F1/329 , G06F7/523 , G06F7/57 , G06F9/30

Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.

110.

发明授权
Notifications in integrated circuits 有权

公开(公告)号：US10896001B1

公开(公告)日：2021-01-19

申请号：US16145050

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Thomas A. Volpe , Nafea Bshara , Raymond Scott Whiteside , Ron Diamant

IPC: G06F11/00 , G06F3/06 , G06F11/07 , G06F13/16

Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, an integrated circuit device can be operable to determine, at a point in time during operation of the integrated circuit device, to generate a notification. The notification can include a type and a timestamp indicating the point in time. The notification can also include information about an internal status of the integrated circuit at the point in time. The device can further selectin a queue from a plurality of queues in a processor memory of the computing system that includes the integrated circuit. The device can further generate a write transaction including the notification, where the write transaction is addressed to the queue. The device can further output the write transaction using a communication interface of the device.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification