-
公开(公告)号:US10678508B2
公开(公告)日:2020-06-09
申请号:US15934681
申请日:2018-03-23
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Randy Huang , Ron Diamant , Thomas Elmer , Sundeep Amirineni
Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. A computer-implemented method includes receiving low-precision inputs for a convolution operation from a storage device, and subtracting a low-precision value representing a high-precision zero value from the low-precision inputs to generate difference values, where the low-precision inputs are asymmetrically quantized from high-precision inputs. The method also includes performing multiplication and summation operations on the difference values to generate a sum of products, and generating a high-precision output by scaling the sum of products with a scaling factor.
-
22.
公开(公告)号:US12271732B1
公开(公告)日:2025-04-08
申请号:US17937333
申请日:2022-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Ron Diamant , Sundeep Amirineni
Abstract: A technique to program a compute channel having multiple computational circuit blocks coupled in series in a pipeline can include receiving a machine instruction for the compute channel. The machine instruction is decoded to obtain an opcode, and the opcode can be used as an index to access an opcode entry in an opcode table. The opcode entry contains a pointer to a microoperation, and the pointer can be used to access a microoperation represented by a control entry in a control table and a datapath configuration entry in a datapath table. The microoperation can then be issued to the compute channel by configuring the compute channel with the control entry and the datapath configuration entry.
-
公开(公告)号:US12182691B1
公开(公告)日:2024-12-31
申请号:US17249900
申请日:2021-03-17
Applicant: Amazon Technologies, Inc.
Inventor: Sundeep Amirineni , Akshay Balasubramanian , Joshua Wayne Bowman , Ron Diamant , Paul Gilbert Meyer , Thomas Elmer
Abstract: To improve performance of a computational array, the architecture of the array can be modified to allow the processing engines of a column to operate in parallel and the clock frequency of the array to be increased. The processing engines of each column of the array can be grouped into a series of row groups. The processing engines of each row group can be loaded with input values, and computations on the input values can be carried out in parallel to generate the column output. One or more flip-flop stages can be inserted into the computational logic of each of the processing engines. The computational logic can then be distributed across the flip-flop stages to reduce the propagation delay between flip-flop stages of the processing engine, hence allowing the clock frequency of the array to be increased.
-
公开(公告)号:US12045475B1
公开(公告)日:2024-07-23
申请号:US17457502
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Patricio Kaplan , Sundeep Amirineni , Laura Sharpless , Ron Diamant , Akshay Balasubramanian
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/064 , G06F3/0656 , G06F3/0659 , G06F3/0679 , G06F12/0246
Abstract: Techniques for implementing a dynamically resizable memory region for alternative use in a memory are described. The techniques may include using two concurrent address maps corresponding to two address ranges for a memory represented as an array of memory blocks. The first address range can be mapped to the memory with starting addresses of the memory blocks incrementing sequentially along each row. The second address range can be mapped to the memory with starting addresses of the memory blocks incrementing sequentially along each column. When an access request is received having a target address belonging to the first address range, the target address is provided as the memory address to access the memory. When an access request having a target address belonging to the second address range, the target address is translated by address translation logic into a memory address to access the memory.
-
公开(公告)号:US11868875B1
公开(公告)日:2024-01-09
申请号:US16127170
申请日:2018-09-10
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Jeffrey T. Huynh , Sundeep Amirineni
Abstract: Provided are systems and methods for operating a neural network processor, wherein the processor includes an input selector circuit that can be configured to select the data that will be input into the processor's computational array. In various implementations, the selector circuit can determine, for a row of the array, whether the row input will be the output from a buffer memory or data that the input selector circuit has selected for a different row. The row can receive an input feature map from a set of input data or an input feature map that was selected for inputting into a different row, such that the input feature map is input into more than one row at a time. The selector circuit can also include a delay circuit, so that the duplicated input feature map can be input into the computational array later than the original input feature map.
-
公开(公告)号:US11775430B1
公开(公告)日:2023-10-03
申请号:US17000842
申请日:2020-08-24
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni , Akshay Balasubramanian , Eyal Freund
IPC: G06F12/08 , G11C11/419 , G11C11/418 , G06N3/063
CPC classification number: G06F12/08 , G06N3/063 , G11C11/418 , G11C11/419
Abstract: Disclosed herein are techniques for performing memory access. In one embodiment, an integrated circuit includes a port and an access engine. The integrated circuit is coupled with a memory device. The access engine is configured to: receive, from an access requester device, a request to access data stored at a memory device; and based on receiving the request: provide, via the port, a sequential access of a plurality of portions of the data to the access requester device; and access the plurality of portions of the data in a parallel form at the memory device for the access requester device. The sequential access can include a sequential write access or a sequential read access of the plurality of portions of the data.
-
公开(公告)号:US10817260B1
公开(公告)日:2020-10-27
申请号:US16007749
申请日:2018-06-13
Applicant: Amazon Technologies, Inc.
Inventor: Randy Huang , Ron Diamant , Thomas Elmer , Sundeep Amirineni , Thomas A. Volpe
Abstract: Systems and methods are provided to skip multiplication operations with zeros in processing elements of the systolic array to reduce dynamic power consumption. A value of zero can be detected on an input data element entering each row of the array and respective zero indicators may be generated. These respective zero indicators may be passed to all the processing elements in the respective rows. The multiplication operation with the zero value can be skipped in each processing element based on the zero indicators, thus reducing dynamic power consumption.
-
公开(公告)号:US10795742B1
公开(公告)日:2020-10-06
申请号:US15638080
申请日:2017-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Asif Khan , Sundeep Amirineni , Kiran Kalkunte Seshadri , Nafea Bshara
Abstract: Disclosed are techniques regarding aspects of implementing client configurable logic within a computer system. The computer system can be a cloud infrastructure. The techniques can include determining that the client configurable logic has performed an errant action.
-
公开(公告)号:US10740432B1
公开(公告)日:2020-08-11
申请号:US16219604
申请日:2018-12-13
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Mohammad El-Shabani , Sundeep Amirineni , Kenneth Wayne Patton , Willis Wang
Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.
-
公开(公告)号:US10678479B1
公开(公告)日:2020-06-09
申请号:US16204943
申请日:2018-11-29
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh
Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank, and can read from and write to only the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
-
-
-
-
-
-
-
-
-