Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

131.

发明授权
Transaction ordering based on target address 有权

公开(公告)号：US12001352B1

公开(公告)日：2024-06-04

申请号：US17937395

申请日：2022-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Rashika Kheria , Ron Diamant , Se Wang Oh , Guy Nakibly

IPC: G06F13/16 , G06F9/46

CPC classification number: G06F13/1621 , G06F9/466

Abstract: Techniques are provided to maintain data coherency for data transfers among data processing devices in a distributed computing environment. A data buffer in each data processing device can be mapped to an address range that is assigned to transactions that allow out-of-order completions, and a message buffer in each data processing device can be mapped to an address range that is assigned to transactions that follow transaction ordering. Thus, a transaction to store a set of data into the data buffer is completed before a transaction to write a synchronization message in the message buffer indicating that the set of data is stored in the data buffer based on the mapping irrespective of the transaction ordering indicated by each transaction.

132.

发明授权
Multidimensional and multiblock tensorized direct memory access descriptors 有权

公开(公告)号：US11983128B1

公开(公告)日：2024-05-14

申请号：US18067109

申请日：2022-12-16

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam

IPC: G06F13/30 , G06F13/16

CPC classification number: G06F13/30 , G06F13/1621 , G06F13/1642

Abstract: Techniques to reduce overhead in a direct memory access (DMA) engine can include processing descriptors from a descriptor queue to obtain a striding configuration to generate tensorized memory descriptors. The striding configuration can include, for each striding dimension, a stride and a repetition number indicating a number of times to repeat striding in the corresponding striding dimension. One or more sets of tensorized memory descriptors can be generated based on the striding configuration. Data transfers are then performed based on the generated tensorized memory descriptors.

133.

发明授权
Dilated convolution using systolic array 有权

公开(公告)号：US11816559B2

公开(公告)日：2023-11-14

申请号：US17832039

申请日：2022-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant

IPC: G06N3/063 , G06F15/80 , G06F17/15 , H04L49/9047 , G06V30/413

CPC classification number: G06N3/063 , G06F15/8046 , G06F17/153 , G06V30/413 , H04L49/9047

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

134.

发明公开
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 审中-公开

公开(公告)号：US20230359876A1

公开(公告)日：2023-11-09

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

CPC classification number: G06N3/063 , G06N3/04

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

135.

发明公开
PROCESSING FOR MULTIPLE INPUT DATA SETS 审中-公开

公开(公告)号：US20230351186A1

公开(公告)日：2023-11-02

申请号：US18144129

申请日：2023-05-05

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang

IPC: G06N3/08 , G06F3/06

CPC classification number: G06N3/082 , G06F3/0604 , G06F3/0644 , G06F3/0673 , G06N3/045

Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

136.

发明授权
Multi-memory on-chip computational network 有权

公开(公告)号：US11741345B2

公开(公告)日：2023-08-29

申请号：US17033573

申请日：2020-09-25

Applicant: Amazon Technologies, Inc.

Inventor： Randy Huang , Ron Diamant

IPC: G06N3/04 , G06N3/045 , G06F15/80 , G06F13/28 , G06F3/06 , G06F13/40

CPC classification number: G06N3/045 , G06F3/061 , G06F3/065 , G06F3/0683 , G06F13/28 , G06F13/4068 , G06F15/80

Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.

137.

发明授权
Performing concurrent operations in a processing element 有权

公开(公告)号：US11720523B2

公开(公告)日：2023-08-08

申请号：US16653578

申请日：2019-10-15

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant

IPC: G06F15/80 , G06N3/02 , G06F17/16 , G06F15/173 , G06F17/15 , G06N3/063 , G06N3/045

CPC classification number: G06F15/8046 , G06F15/173 , G06F17/15 , G06F17/16 , G06N3/02 , G06N3/045 , G06N3/063

Abstract: A processing element (PE) of a systolic array can perform neural networks computations on two or more data elements of an input data set using the same weight. Thus, two or more output data elements corresponding to an output data set may be generated. Based on the size of the input data set and an input data type, the systolic array can process a single data element or multiple data elements in parallel.

138.

发明授权
Multi-model training pipeline in distributed systems 有权

公开(公告)号：US11676021B1

公开(公告)日：2023-06-13

申请号：US17947355

申请日：2022-09-19

Applicant: Amazon Technologies, Inc.

Inventor： Patricio Kaplan , Ron Diamant

IPC: G06N3/08 , G06N3/045

CPC classification number: G06N3/08 , G06N3/045

Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.

139.

发明授权
Matrix transpose hardware acceleration 有权

公开(公告)号：US11636569B1

公开(公告)日：2023-04-25

申请号：US17029609

申请日：2020-09-23

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant

IPC: G06T3/60 , G06T1/60 , G06F12/0862 , G06N3/08 , G06N3/04 , G06V10/94

Abstract: In one example, an apparatus comprises: a buffer memory; and a memory access circuit configured to: fetch, from a first memory, a set of first groups of data elements of a first matrix, each first group of data elements being stored at consecutive memory addresses at the first memory; based on a first configuration, store the set of first groups of data elements at consecutive memory addresses or at non-consecutive memory addresses at the buffer memory; based on a second configuration that defines a memory address offset, fetch a set of second groups of the data elements from the buffer memory, each second group of the data elements being stored at consecutive memory addresses of the buffer memory, each second group being separated by the memory address offset in the buffer memory; and store each fetched second group at consecutive addresses of a destination memory to form a second matrix.

140.

发明申请
SYSTOLIC ARRAY WITH INPUT REDUCTION TO MULTIPLE REDUCED INPUTS 有权

公开(公告)号：US20230004523A1

公开(公告)日：2023-01-05

申请号：US17363900

申请日：2021-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Thomas A. Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer

IPC: G06F15/80 , G06F7/487 , G06F7/499 , G06F7/501

Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification