-
公开(公告)号:US11714992B1
公开(公告)日:2023-08-01
申请号:US16219760
申请日:2018-12-13
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant
IPC: G06F16/00 , G06N3/04 , G06F9/30 , G06F16/901 , G06F9/48
CPC classification number: G06N3/04 , G06F9/4881 , G06F9/30003 , G06F16/9024
Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.
-
公开(公告)号:US11625453B1
公开(公告)日:2023-04-11
申请号:US16712699
申请日:2019-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Ron Diamant
Abstract: To improve utilization of a systolic array, each row of the array is provided with a number of general purpose row input data buses. Each of the general purpose row input data buses can be operable to transfer either feature map (FMAP) input elements or weight values into the processing elements of the corresponding row of the array. By using such general purpose row input data buses, concurrent matrix multiplications as well as faster background weight loading can be achieved in the array.
-
公开(公告)号:US11550736B1
公开(公告)日:2023-01-10
申请号:US17449581
申请日:2021-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam
Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.
-
公开(公告)号:US20230004384A1
公开(公告)日:2023-01-05
申请号:US17363894
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
-
公开(公告)号:US11483296B1
公开(公告)日:2022-10-25
申请号:US16917367
申请日:2020-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Nafea Bshara , Leah Shalev , Erez Izenberg
Abstract: A hardware security accelerator includes a configurable parser that is configured to receive a packet and to extract from the packet headers associated with a set of protocols. The security accelerator also includes a packet type detection unit to determine a type of the packet in response to the set of protocols and to generate a packet type identifier indicative of the type of the packet. A configurable security unit includes a configuration unit and a configurable security engine. The configuration unit configures the configurable security engine according to the type of the packet and to content of at least one of the headers extracted from the packet. The configurable security engine performs security processing of the packet to provide at least one security result.
-
公开(公告)号:US11468304B1
公开(公告)日:2022-10-11
申请号:US16696377
申请日:2019-11-26
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant
Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.
-
公开(公告)号:US11467973B1
公开(公告)日:2022-10-11
申请号:US16146332
申请日:2018-09-28
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni
IPC: G06F12/00 , G06F12/10 , G06F12/1018 , G06F12/1027 , G06F12/1081 , G06F12/1045 , G06F12/1009 , G06F12/1036 , G06F12/1072
Abstract: Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.
-
公开(公告)号:US11294841B1
公开(公告)日:2022-04-05
申请号:US16985056
申请日:2020-08-04
Applicant: Amazon Technologies, Inc.
Inventor: Adiel Sarusi , Ron Diamant , Ori Weber , Erez Izenberg
Abstract: Techniques disclosed herein relate to dynamically configurable multi-stage pipeline processing units. In one embodiment, a circuit includes a plurality of processing engines and a plurality of switches. Each of the plurality of processing engines includes an input port and an output port. Each of the plurality of switches comprises two input ports and two output ports. For each processing engine, the input port of the processing engine is electrically coupled to one of the switches, the output port of the processing engine is electrically coupled to another one of the switches, and the input port of the processing engine is electrically coupled to the output port of each of the processing engines by the switches.
-
公开(公告)号:US11294599B1
公开(公告)日:2022-04-05
申请号:US16891438
申请日:2020-06-03
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh
Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
-
公开(公告)号:US11263517B1
公开(公告)日:2022-03-01
申请号:US15908080
申请日:2018-02-28
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Huang
Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include an arithmetic circuit configured to perform arithmetic operations for a neural network. The integrated circuit may also include a weight processing circuit configured to: acquire data from a memory device; receive configuration information indicating a size of each quantized weight of a set of quantized weights; extract the set of quantized weights from the data based on the size of the each weight indicated by the configuration information; perform de-quantization processing on the set of quantized weights to generate a set of de-quantized weights; and provide the set of de-quantized weights to the arithmetic circuit to enable the arithmetic circuit to perform the arithmetic operations. The memory device may be part of or external to the integrated circuit.
-
-
-
-
-
-
-
-
-