-
公开(公告)号:US11423313B1
公开(公告)日:2022-08-23
申请号:US16218082
申请日:2018-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni , Mohammad El-Shabani , Kenneth Wayne Patton , Thomas Elmer
Abstract: Methods and systems for performing hardware approximation of function are provided. In one example, a system comprises a controller, configurable arithmetic circuits, and a mapping table. The mapping table stores a first set of function parameters in a first mode of operation and stores a second set of function parameters in a second mode of operation. Depending on the mode of operation, the controller may configure the arithmetic circuits to compute a first approximation result of a function at an input value based on the first set of function parameters, or to compute a second approximation result of the function at the input value based on the second set of function parameters and to perform post-processing, such as quantization, of the second approximation result.
-
公开(公告)号:US11409685B1
公开(公告)日:2022-08-09
申请号:US17031653
申请日:2020-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
IPC: G06F13/40 , G06F13/28 , G06F13/42 , G06F9/48 , G06N3/08 , G06F15/173 , H04L67/1095 , H04L49/90 , H04L49/15
Abstract: In one example, a method comprises: receiving, by a hardware data processor and from a network adapter, a transfer complete message indicating that the network adapter has initiated a transfer of data received from a network to the hardware data processor, the transfer being performed over an interconnect coupled between the hardware data processor and the network adapter; based on receiving the transfer complete message, performing, by the hardware data processor, a flush operation to fetch any remaining portion of the data buffered in the interconnect to a local memory of the hardware data processor; based on determining that flush operation is complete, storing, by the data hardware processor, the transfer complete message at the local memory; and based on determining that the transfer complete message is stored at the local memory, starting the computation operation of the data at the hardware data processor or preforming an error handling operation.
-
公开(公告)号:US11379555B2
公开(公告)日:2022-07-05
申请号:US16457503
申请日:2019-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant
IPC: G06F17/15 , G06V10/75 , G06F15/80 , G06V30/413 , H04L49/9047
Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.
-
公开(公告)号:US20220188073A1
公开(公告)日:2022-06-16
申请号:US17247475
申请日:2020-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Joshua Wayne Bowman , Thomas A. Volpe , Sundeep Amirineni , Nishith Desai , Ron Diamant
Abstract: To reduce power consumption, data bits or a portion of a data register that is not expected to toggle frequently can be grouped together, and be clock-gated independently from the rest of the data register. The grouping of the data bits can be determined based on the data types of the workload being operated on. For a data register configured to store a numeric value that supports multiple data types, the portion of the data register being clock-gated may store a group of data bits that are unused for one or more data types of the multiple data types supported by the data register. The portion of the data register being clock-gated can also be a group of data bits that remain unchanged or have a constant value for numeric values within a certain numeric range that is frequently operated on.
-
公开(公告)号:US11354258B1
公开(公告)日:2022-06-07
申请号:US17038623
申请日:2020-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
IPC: G06F13/16 , G06F15/173 , G06F13/38 , G06F13/24
Abstract: In one example, an apparatus comprises: a first local memory, a computation engine configured to generate local data and to store the local data at the first local memory, and a controller. The apparatus is coupled with a host processor and a second device via an interconnect, the second device comprising a second local memory, the host processor hosting an application. The controller is configured to: receive, from the second device, a first message indicating that first data is stored in the second local memory; based on the first message: fetch the first data from the second local memory via the interconnect; control the computation engine to perform a computation operation on the first data to generate second data to support the application hosted by the host processor; and transmit, to the second device, a second message indicating that the second data is stored in the first local memory.
-
公开(公告)号:US11321179B1
公开(公告)日:2022-05-03
申请号:US17001145
申请日:2020-08-24
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Thomas A. Volpe , Ron Diamant , Mark Anthony Banse
Abstract: A circuit at an interface between a device and an interconnect fabric is configured to track outstanding transactions associated with the device and ensure the completion of the outstanding transactions before rebooting or powering down the device. In some embodiments, the circuit is also configurable to provide appropriate responses when the device is powered down or is being rebooted such that other devices in the system can still operate even without knowing that the device is inactive and would not hang because no response is received from the device.
-
公开(公告)号:US11157452B2
公开(公告)日:2021-10-26
申请号:US15590898
申请日:2017-05-09
Applicant: Amazon Technologies, Inc.
Inventor: Nafea Bshara , Leah Shalev , Erez Izenberg , Georgy Machulsky , Ron Diamant
IPC: G06F16/174 , G06F16/27 , G06F16/901
Abstract: A method for in-band de-duplication, the method may include receiving by a hardware accelerator, a received packet of a first sequence of packets that conveys a first data chunk; applying a data chunk hash calculation process on the received packet while taking into account a hash calculation result obtained when applying the data chunk hash calculation process on a last packet of the first sequence that preceded the received packet; wherein the calculating of the first data chunk hash value is initiated before a completion of a reception of the entire first data chunk by the hardware accelerator.
-
公开(公告)号:US20210158132A1
公开(公告)日:2021-05-27
申请号:US16698461
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
-
公开(公告)号:US10990650B1
公开(公告)日:2021-04-27
申请号:US15933339
申请日:2018-03-22
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant
Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.
-
公开(公告)号:US10943167B1
公开(公告)日:2021-03-09
申请号:US16538698
申请日:2019-08-12
Applicant: Amazon Technologies, Inc.
Inventor: Sundeep Amirineni , Ron Diamant , Randy Huang , Thomas A. Volpe
Abstract: Disclosed herein are techniques for performing neural network computations. In one embodiment, an apparatus includes an array of processing elements, the array having configurable dimensions. The apparatus further includes a controller configured to set the dimensions of the array of processing elements based on at least one of: a first number of input data sets to be received by the array, or a second number of output data sets to be output by the array.
-
-
-
-
-
-
-
-
-