-
公开(公告)号:US10768856B1
公开(公告)日:2020-09-08
申请号:US15919167
申请日:2018-03-12
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni , Akshay Balasubramanian , Eyal Freund
IPC: G06F3/06 , G06N3/04 , G06N3/063 , G11C11/413
Abstract: Disclosed herein are techniques for performing memory access. In one embodiment, an integrated circuit may include a memory device, a first port to receive first data elements from a memory access circuit within a first time period, and a second port to transmit second data elements to the memory access circuit within a second time period. The memory access circuit may receive the first data elements from the memory device within a third time period shorter than the first time period and transmit, via the first port, the received first data elements to a first processing circuit sequentially within the first time period. The memory access circuit may receive, via the second port, the second data elements from a second processing circuit sequentially within the second time period, and store the received second data elements in the memory device within a fourth time period shorter than the second time period.
-
公开(公告)号:US10587514B1
公开(公告)日:2020-03-10
申请号:US14977468
申请日:2015-12-21
Applicant: Amazon Technologies, Inc.
Inventor: Bijendra Singh , Thomas A. Volpe , Sundeep Amirineni
IPC: H04L12/741
Abstract: Packet processing pipelines may implement filtering of control plane decisions. When network packets are received various types of decision-making and processing is performed. In order to complete processing for the network packet, some decisions may need to be determined by a control plane for the packet processing pipeline, such as a general processor. Requests for control plane decisions for received network packets may be filtered prior to sending the requests to the control plane based on whether the same control plane decisions have been requested for previously received network packets. For control plane decisions with outstanding control plane decision requests, an additional control plane decision request for the network packet may be blocked, whereas control plane decisions with no outstanding control plane decision requests may be allowed.
-
公开(公告)号:US12242853B1
公开(公告)日:2025-03-04
申请号:US17937335
申请日:2022-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Ron Diamant , Sundeep Amirineni
Abstract: A compute channel having a compute pipeline of compute stages can be configured using a configuration pipeline with a control table and a datapath table. The control table stores control entries corresponding to respective microoperations, and each control entry includes control information for the compute channel. A datapath table stores datapath configuration entries corresponding to respective microoperations, and each datapath configuration entry has a datapath configuration that includes computational circuit block configurations to configure respective computational circuit blocks in the compute pipeline of the compute channel. Control logic can issue a microoperation to the compute channel by configuring the compute channel according to the control information of the microoperation obtained from the control table, and by inputting the datapath configuration of the microoperation obtained from the datapath table into the configuration pipeline of the compute channel.
-
公开(公告)号:US20220350775A1
公开(公告)日:2022-11-03
申请号:US17659642
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US11467973B1
公开(公告)日:2022-10-11
申请号:US16146332
申请日:2018-09-28
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni
IPC: G06F12/00 , G06F12/10 , G06F12/1018 , G06F12/1027 , G06F12/1081 , G06F12/1045 , G06F12/1009 , G06F12/1036 , G06F12/1072
Abstract: Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.
-
公开(公告)号:US11308027B1
公开(公告)日:2022-04-19
申请号:US16915795
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US11294599B1
公开(公告)日:2022-04-05
申请号:US16891438
申请日:2020-06-03
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh
Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
-
公开(公告)号:US10990408B1
公开(公告)日:2021-04-27
申请号:US16582573
申请日:2019-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Akshay Balasubramanian , Sundeep Amirineni
Abstract: Methods for place-and-route aware data pipelining for an integrated circuit device are provided. In large integrated circuits, the physical distance a data signal must travel between a signal source in a master circuit block partition and a signal destination in a servant circuit block partition can exceed the distance the signal can travel in a single clock cycle. To maintain timing requirements of the integrated circuit, a longest physical distance and signal delay for a datapath between master and servant circuit block partitions can be determined and pipelining registers added. Datapaths of master circuit block partitions further away from the servant circuit block can have more pipelining registers added within the master circuit block than datapaths of master circuit block partitions that are closer to the servant circuit block.
-
公开(公告)号:US10983754B2
公开(公告)日:2021-04-20
申请号:US16891010
申请日:2020-06-02
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Randy Huang , Ron Diamant , Thomas Elmer , Sundeep Amirineni
Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.
-
公开(公告)号:US10733498B1
公开(公告)日:2020-08-04
申请号:US16215405
申请日:2018-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni , Mohammad El-Shabani
Abstract: Methods and systems for supporting parametric function computations in hardware circuits are proposed. In one example, a system comprises a hardware mapping table, a control circuit, and arithmetic circuits. The control circuit is configured to: in a first mode of operation, forward a set of parameters of a non-parametric function associated with an input value from the hardware mapping table to the arithmetic circuits to compute a first approximation of the non-parametric function at the input value; and in a second mode of operation, based on information indicating whether the input value is in a first input range or in a second input range from the hardware mapping table, forward a first parameter or a second parameter of a parametric function to the arithmetic circuits to compute, respectively, a second approximation or a third approximation of the parametric function at the input value.
-
-
-
-
-
-
-
-
-