Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

1.

发明授权
Strong ordered transaction for DMA transfers 有权

公开(公告)号：US12204757B1

公开(公告)日：2025-01-21

申请号：US18067514

申请日：2022-12-16

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Raymond S. Whiteside

IPC: G06F3/06

Abstract: A technique for processing strong ordered transactions in a direct memory access engine may include retrieving a memory descriptor to perform a strong ordered transaction, and delaying the strong ordered transaction until pending write transactions associated with previous memory descriptors retrieved prior to the memory descriptor are complete. Subsequent transactions associated with memory descriptors following the memory descriptor are allowed to be issued while waiting for the pending write transactions to complete. Upon completion of the pending write transactions, the strong ordered transaction is performed.

2.

发明授权
Neural network processing based on subgraph recognition 有权

公开(公告)号：US12093801B1

公开(公告)日：2024-09-17

申请号：US18142952

申请日：2023-05-03

Applicant: Amazon Technologies, Inc.

Inventor： Richard John Heaton , Randy Renfu Huang , Ron Diamant

IPC: G06F16/00 , G06F9/30 , G06F9/48 , G06F16/901 , G06N3/04

CPC classification number: G06N3/04 , G06F9/30003 , G06F9/4881 , G06F16/9024

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.

3.

发明授权
Processing for multiple input data sets in a multi-layer neural network 有权

公开(公告)号：US12067492B2

公开(公告)日：2024-08-20

申请号：US18144129

申请日：2023-05-05

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang

IPC: G06F3/06 , G06N3/045 , G06N3/082

CPC classification number: G06N3/082 , G06F3/0604 , G06F3/0644 , G06F3/0673 , G06N3/045

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

4.

发明授权
Low latency memory notification 有权

公开(公告)号：US12056072B1

公开(公告)日：2024-08-06

申请号：US17457603

申请日：2021-12-03

Applicant: Amazon Technologies, Inc.

Inventor： Patricio Kaplan , Ron Diamant

IPC: G06F13/28 , G06F3/06

CPC classification number: G06F13/28 , G06F3/0611 , G06F3/0655 , G06F3/0679 , G06F2213/28

Abstract: Techniques to reduce the latency of data transfer notifications in a computing system are disclosed. The techniques can include receiving, at a memory, a first access request of a set of access requests associated with a data transfer. The first access request has a token and an access count indicating the number of access requests in the set of access requests. A counter is initiated to count the number of received access requests having the token. When additional access requests belonging to the set of access requests are received, the counter is incremented for each of the additional access requests being received. A notification is transmitted to an integrated circuit component in response to receiving the last access request of the set of access requests having the token to notify the integrated circuit component that the memory is ready for access.

5.

发明授权
Reducing computations for data including padding 有权

公开(公告)号：US11960566B1

公开(公告)日：2024-04-16

申请号：US17229742

申请日：2021-04-13

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant

IPC: G06F17/16 , G06N3/08

CPC classification number: G06F17/16 , G06N3/08

Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.

6.

发明授权
Input batching with serial dynamic memory access 有权

公开(公告)号：US11875247B1

公开(公告)日：2024-01-16

申请号：US16905769

申请日：2020-06-18

Applicant: Amazon Technologies, Inc.

Inventor： Richard John Heaton , Ron Diamant

IPC: G06N3/063 , G06N3/08

CPC classification number: G06N3/063 , G06N3/08

Abstract: An acceleration engine with multiple accelerators may share a common set of data that is used by each accelerator to perform computations on input data. The set of shared data can be loaded into the acceleration engine from an external memory. Instead of accessing the external memory multiple times to load the set of shared data into each accelerator, the external memory can be accessed once using direct memory access to load the set of shared data into the first accelerator. The set of shared data can then be serially loaded from one accelerator to the next accelerator in the acceleration engine using direct memory access. To achieve data parallelism and reduce computation time, a runtime driver may split the input data into data batches, and each accelerator can perform computations on a different batch of input data with the common set of shared data.

7.

发明授权
Executing sublayers of a fully-connected layer 有权

公开(公告)号：US11868878B1

公开(公告)日：2024-01-09

申请号：US15934523

申请日：2018-03-23

Applicant: Amazon Technologies, Inc.

Inventor： Randy Huang , Ron Diamant

IPC: G06N3/08 , G06N5/046 , G06F18/2413 , G06F18/2431

CPC classification number: G06N3/08 , G06F18/2413 , G06F18/2431 , G06N5/046

Abstract: Disclosed herein are techniques for implementing a large fully-connected layer in an artificial neural network. The large fully-connected layer is grouped into multiple fully-connected subnetworks. Each fully-connected subnetwork is configured to classify an object into an unknown class or a class in a subset of target classes. If the object is classified as the unknown class by a fully-connected subnetwork, a next fully-connected subnetwork may be used to further classify the object. In some embodiments, the fully-connected layer is grouped based on a ranking of target classes.

8.

发明授权
Processing for multiple input data sets 有权

公开(公告)号：US11797853B2

公开(公告)日：2023-10-24

申请号：US17951084

申请日：2022-09-22

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang

IPC: G06F3/06 , G06N3/082 , G06N3/045

CPC classification number: G06N3/082 , G06F3/0604 , G06F3/0644 , G06F3/0673 , G06N3/045

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

9.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US11741350B2

公开(公告)日：2023-08-29

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

10.

发明授权
Error avoidance in memory device 有权

公开(公告)号：US11704211B1

公开(公告)日：2023-07-18

申请号：US17643292

申请日：2021-12-08

Applicant: Amazon Technologies, Inc.

Inventor： Patricio Kaplan , Ron Diamant , Brian Robert Silver

IPC: G06F11/00 , G06F11/20

CPC classification number: G06F11/2094 , G06F2201/82

Abstract: Techniques for avoiding uncorrectable errors in a memory device can include detecting a correctable error pattern of a memory page of a memory device, and determining that the correctable error pattern of the memory page satisfies a page migration condition. Upon satisfying the page migration condition, write accesses to the memory page are prevented from reaching a memory controller of the memory device. The contents of the memory page are then migrated to a reserved page, and a mapping table is updated to replace accesses to the memory page with accesses to the reserved page.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification