-
公开(公告)号:US12045475B1
公开(公告)日:2024-07-23
申请号:US17457502
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Patricio Kaplan , Sundeep Amirineni , Laura Sharpless , Ron Diamant , Akshay Balasubramanian
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/064 , G06F3/0656 , G06F3/0659 , G06F3/0679 , G06F12/0246
Abstract: Techniques for implementing a dynamically resizable memory region for alternative use in a memory are described. The techniques may include using two concurrent address maps corresponding to two address ranges for a memory represented as an array of memory blocks. The first address range can be mapped to the memory with starting addresses of the memory blocks incrementing sequentially along each row. The second address range can be mapped to the memory with starting addresses of the memory blocks incrementing sequentially along each column. When an access request is received having a target address belonging to the first address range, the target address is provided as the memory address to access the memory. When an access request having a target address belonging to the second address range, the target address is translated by address translation logic into a memory address to access the memory.
-
公开(公告)号:US11467992B1
公开(公告)日:2022-10-11
申请号:US17031668
申请日:2020-09-24
Applicant: Amazon Technologies, inc.
Inventor: Patricio Kaplan , Ron Diamant
Abstract: In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.
-
公开(公告)号:US12056072B1
公开(公告)日:2024-08-06
申请号:US17457603
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
CPC classification number: G06F13/28 , G06F3/0611 , G06F3/0655 , G06F3/0679 , G06F2213/28
Abstract: Techniques to reduce the latency of data transfer notifications in a computing system are disclosed. The techniques can include receiving, at a memory, a first access request of a set of access requests associated with a data transfer. The first access request has a token and an access count indicating the number of access requests in the set of access requests. A counter is initiated to count the number of received access requests having the token. When additional access requests belonging to the set of access requests are received, the counter is incremented for each of the additional access requests being received. A notification is transmitted to an integrated circuit component in response to receiving the last access request of the set of access requests having the token to notify the integrated circuit component that the memory is ready for access.
-
公开(公告)号:US11704211B1
公开(公告)日:2023-07-18
申请号:US17643292
申请日:2021-12-08
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant , Brian Robert Silver
CPC classification number: G06F11/2094 , G06F2201/82
Abstract: Techniques for avoiding uncorrectable errors in a memory device can include detecting a correctable error pattern of a memory page of a memory device, and determining that the correctable error pattern of the memory page satisfies a page migration condition. Upon satisfying the page migration condition, write accesses to the memory page are prevented from reaching a memory controller of the memory device. The contents of the memory page are then migrated to a reserved page, and a mapping table is updated to replace accesses to the memory page with accesses to the reserved page.
-
公开(公告)号:US20210303988A1
公开(公告)日:2021-09-30
申请号:US16835161
申请日:2020-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.
-
公开(公告)号:US11676021B1
公开(公告)日:2023-06-13
申请号:US17947355
申请日:2022-09-19
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.
-
公开(公告)号:US11468325B2
公开(公告)日:2022-10-11
申请号:US16835161
申请日:2020-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.
-
公开(公告)号:US20220318604A1
公开(公告)日:2022-10-06
申请号:US17301271
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Patricio Kaplan
Abstract: To reduce the storage size of weight tensors and speed up loading of weight tensors from system memory, a compression technique can be employed to remove zero values from a weight tensor before storing the weight tensor in system memory. A sparsity threshold can be enforced to achieve a compression ratio target by forcing small weight values to zero during training. When the weight tensor is loaded from system memory, a direct memory access (DMA) engine with an in-line decompression unit can decompress the weight tensor on-the-fly. By performing the decompression in the DMA engine, expansion of the weight values back to the original weight tensor size can be carried out in parallel while other neural network computations are being performed by the processing unit.
-
公开(公告)号:US12254398B2
公开(公告)日:2025-03-18
申请号:US17301271
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Patricio Kaplan
Abstract: To reduce the storage size of weight tensors and speed up loading of weight tensors from system memory, a compression technique can be employed to remove zero values from a weight tensor before storing the weight tensor in system memory. A sparsity threshold can be enforced to achieve a compression ratio target by forcing small weight values to zero during training. When the weight tensor is loaded from system memory, a direct memory access (DMA) engine with an in-line decompression unit can decompress the weight tensor on-the-fly. By performing the decompression in the DMA engine, expansion of the weight values back to the original weight tensor size can be carried out in parallel while other neural network computations are being performed by the processing unit.
-
公开(公告)号:US11948352B2
公开(公告)日:2024-04-02
申请号:US16831060
申请日:2020-03-26
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Randy Renfu Huang
CPC classification number: G06V10/955 , G06N3/063 , G06N3/084 , G06N5/046 , G06N20/00 , G06V10/764 , G06V10/82
Abstract: The exchange of weight gradients among the processing nodes can introduce a substantial bottleneck to the training process. Instead of remaining idle during the weight gradients exchange process, a processing node can update its own set of weights for the next iteration of the training process using the processing node's local weight gradients. The next iteration of training can be started by using these speculative weights until the weight gradients exchange process completes and a global weights update is available. If the speculative weights is close enough to the weight values from the global weights update, the training process at the processing node can continue training using the results computed from the speculative weights to reduce the overall training time.
-
-
-
-
-
-
-
-
-