-
公开(公告)号:US12204757B1
公开(公告)日:2025-01-21
申请号:US18067514
申请日:2022-12-16
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Ilya Minkin , Raymond S. Whiteside
IPC: G06F3/06
Abstract: A technique for processing strong ordered transactions in a direct memory access engine may include retrieving a memory descriptor to perform a strong ordered transaction, and delaying the strong ordered transaction until pending write transactions associated with previous memory descriptors retrieved prior to the memory descriptor are complete. Subsequent transactions associated with memory descriptors following the memory descriptor are allowed to be issued while waiting for the pending write transactions to complete. Upon completion of the pending write transactions, the strong ordered transaction is performed.
-
公开(公告)号:US12093801B1
公开(公告)日:2024-09-17
申请号:US18142952
申请日:2023-05-03
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant
IPC: G06F16/00 , G06F9/30 , G06F9/48 , G06F16/901 , G06N3/04
CPC classification number: G06N3/04 , G06F9/30003 , G06F9/4881 , G06F16/9024
Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.
-
公开(公告)号:US12067492B2
公开(公告)日:2024-08-20
申请号:US18144129
申请日:2023-05-05
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang
CPC classification number: G06N3/082 , G06F3/0604 , G06F3/0644 , G06F3/0673 , G06N3/045
Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.
-
公开(公告)号:US12056072B1
公开(公告)日:2024-08-06
申请号:US17457603
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
CPC classification number: G06F13/28 , G06F3/0611 , G06F3/0655 , G06F3/0679 , G06F2213/28
Abstract: Techniques to reduce the latency of data transfer notifications in a computing system are disclosed. The techniques can include receiving, at a memory, a first access request of a set of access requests associated with a data transfer. The first access request has a token and an access count indicating the number of access requests in the set of access requests. A counter is initiated to count the number of received access requests having the token. When additional access requests belonging to the set of access requests are received, the counter is incremented for each of the additional access requests being received. A notification is transmitted to an integrated circuit component in response to receiving the last access request of the set of access requests having the token to notify the integrated circuit component that the memory is ready for access.
-
公开(公告)号:US11960566B1
公开(公告)日:2024-04-16
申请号:US17229742
申请日:2021-04-13
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant
Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.
-
公开(公告)号:US11875247B1
公开(公告)日:2024-01-16
申请号:US16905769
申请日:2020-06-18
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Ron Diamant
Abstract: An acceleration engine with multiple accelerators may share a common set of data that is used by each accelerator to perform computations on input data. The set of shared data can be loaded into the acceleration engine from an external memory. Instead of accessing the external memory multiple times to load the set of shared data into each accelerator, the external memory can be accessed once using direct memory access to load the set of shared data into the first accelerator. The set of shared data can then be serially loaded from one accelerator to the next accelerator in the acceleration engine using direct memory access. To achieve data parallelism and reduce computation time, a runtime driver may split the input data into data batches, and each accelerator can perform computations on a different batch of input data with the common set of shared data.
-
公开(公告)号:US11868878B1
公开(公告)日:2024-01-09
申请号:US15934523
申请日:2018-03-23
Applicant: Amazon Technologies, Inc.
Inventor: Randy Huang , Ron Diamant
IPC: G06N3/08 , G06N5/046 , G06F18/2413 , G06F18/2431
CPC classification number: G06N3/08 , G06F18/2413 , G06F18/2431 , G06N5/046
Abstract: Disclosed herein are techniques for implementing a large fully-connected layer in an artificial neural network. The large fully-connected layer is grouped into multiple fully-connected subnetworks. Each fully-connected subnetwork is configured to classify an object into an unknown class or a class in a subset of target classes. If the object is classified as the unknown class by a fully-connected subnetwork, a next fully-connected subnetwork may be used to further classify the object. In some embodiments, the fully-connected layer is grouped based on a ranking of target classes.
-
公开(公告)号:US11797853B2
公开(公告)日:2023-10-24
申请号:US17951084
申请日:2022-09-22
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang
CPC classification number: G06N3/082 , G06F3/0604 , G06F3/0644 , G06F3/0673 , G06N3/045
Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.
-
公开(公告)号:US11741350B2
公开(公告)日:2023-08-29
申请号:US16698461
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
-
公开(公告)号:US11704211B1
公开(公告)日:2023-07-18
申请号:US17643292
申请日:2021-12-08
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant , Brian Robert Silver
CPC classification number: G06F11/2094 , G06F2201/82
Abstract: Techniques for avoiding uncorrectable errors in a memory device can include detecting a correctable error pattern of a memory page of a memory device, and determining that the correctable error pattern of the memory page satisfies a page migration condition. Upon satisfying the page migration condition, write accesses to the memory page are prevented from reaching a memory controller of the memory device. The contents of the memory page are then migrated to a reserved page, and a mapping table is updated to replace accesses to the memory page with accesses to the reserved page.
-
-
-
-
-
-
-
-
-