Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

141.

发明授权
Hardware engine with configurable instructions 有权

公开(公告)号：US11507378B1

公开(公告)日：2022-11-22

申请号：US17188548

申请日：2021-03-01

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Sundeep Amirineni , Mohammad El-Shabani , Sagar Sonar , Kenneth Wayne Patton

IPC: G06F9/30 , G06F9/38 , G06N20/00

Abstract: In one example, an integrated circuit comprises: a memory configured to store a first mapping between a first opcode and first control information and a second mapping between the first opcode and second control information; a processing engine configured to perform processing operations based on the control information; and a controller configured to: at a first time, provide the first opcode to the memory to, based on the first mapping stored in the memory, fetch the first control information for the processing engine, to enable the processing engine to perform a first processing operation based on the first control information; and at a second time, provide the first opcode to the memory to, based on the second mapping stored in the memory, fetch the second control information for the processing engine, to enable the processing engine to perform a second processing operation based on the second control information.

142.

发明授权
Multi-model training pipeline in distributed systems 有权

公开(公告)号：US11468325B2

公开(公告)日：2022-10-11

申请号：US16835161

申请日：2020-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Patricio Kaplan , Ron Diamant

IPC: G06N3/08 , G06N3/04

Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.

143.

发明申请
SPARSE MACHINE LEARNING ACCELERATION 有权

公开(公告)号：US20220318604A1

公开(公告)日：2022-10-06

申请号：US17301271

申请日：2021-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Patricio Kaplan

IPC: G06N3/063 , G06N3/04 , G06N3/08

Abstract: To reduce the storage size of weight tensors and speed up loading of weight tensors from system memory, a compression technique can be employed to remove zero values from a weight tensor before storing the weight tensor in system memory. A sparsity threshold can be enforced to achieve a compression ratio target by forcing small weight values to zero during training. When the weight tensor is loaded from system memory, a direct memory access (DMA) engine with an in-line decompression unit can decompress the weight tensor on-the-fly. By performing the decompression in the DMA engine, expansion of the weight values back to the original weight tensor size can be carried out in parallel while other neural network computations are being performed by the processing unit.

144.

发明授权
Hardware accelerator having reconfigurable instruction set and reconfigurable decoder 有权

公开(公告)号：US11334358B2

公开(公告)日：2022-05-17

申请号：US16707857

申请日：2019-12-09

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant

IPC: G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G11C15/04

Abstract: In one example, a hardware accelerator comprises: a programmable hardware instruction decoder programmed to store a plurality of opcodes; a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions; a hardware execution engine; and a controller configured to: receive an instruction that includes a first opcode of the plurality of opcodes; control the hardware instruction decoder to extract the first opcode from the instruction; obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand; and forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine to extract the first operand from the instruction based on the first definition, and execute the instruction based on the first operand.

145.

发明授权
Neural network layer-by-layer debugging 有权

公开(公告)号：US11308396B2

公开(公告)日：2022-04-19

申请号：US16455329

申请日：2019-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Jeffrey T. Huynh , Drazen Borkovic , Se jong Oh , Ron Diamant , Randy Renfu Huang

IPC: G06N3/08 , G06F9/38

Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.

146.

发明授权
Weight loading in an array 有权

公开(公告)号：US11275997B1

公开(公告)日：2022-03-15

申请号：US15967318

申请日：2018-04-30

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant , Sundeep Amirineni

IPC: G06N3/08 , G06N3/04

Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include memory configured to store a first weight and a second weight; a row of processing elements comprising a first processing element and a second processing element, the first processing element comprising a first weight register, the second processing element comprising a second weight register, both of the first weight register and the second weight register being controllable by a weight load signal; and a controller configured to: provide the first weight from the memory to the row of processing elements; set the weight load signal to enable the first weight to propagate through the row to reach the first processing element; and set the weight load signal to store the first weight at the first weight register and the flush value at the second weight register.

147.

发明授权
Test generation of a distributed system 有权

公开(公告)号：US11275661B1

公开(公告)日：2022-03-15

申请号：US16582346

申请日：2019-09-25

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant

IPC: G06F3/06 , G06F11/26 , G06F11/22 , G06N3/08 , G06F11/34 , G06F11/263

Abstract: A method of generating instructions to be executed by a plurality of execution engines that shares a resource is provided. The method comprises, in a first generation step: reading a first engine logical timestamp vector of a first execution engine of the execution engines, the logical timestamp representing a history of access operations for the resource; determining whether the first engine logical timestamp vector includes a most-up-to-date logical timestamp of the resource in the first generation step; based on the first engine logical timestamp vector including the most-up-to-date logical timestamp of the resource in the first generation step, generating an access instruction to be executed by the first execution engine to access the resource; and scheduling the first execution engine to execute the access instruction.

148.

发明授权
Circuit architecture with biased randomization 有权

公开(公告)号：US11250319B1

公开(公告)日：2022-02-15

申请号：US15714924

申请日：2017-09-25

Applicant: Amazon Technologies, Inc.

Inventor： Randy Huang , Ron Diamant

IPC: G06N3/08 , G06N7/00

Abstract: Disclosed herein are techniques for classifying data with a data processing circuit. In one embodiment, the data processing circuit includes a probabilistic circuit configurable to generate a decision at a pre-determined probability, and an output generation circuit including an output node and configured to receive input data and a weight, and generate output data at the output node for approximating a product of the input data and the weight. The generation of the output data includes propagating the weight to the output node according a first decision of the probabilistic circuit. The probabilistic circuit is configured to generate the first decision at a probability determined based on the input data.

149.

发明授权
Debug for computation networks using error detection codes 有权

公开(公告)号：US11232016B1

公开(公告)日：2022-01-25

申请号：US16138145

申请日：2018-09-21

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Sundeep Amirineni , Randy Renfu Huang

IPC: G06F11/36 , G06N3/063 , G06F8/41 , G06F11/10

Abstract: Techniques disclosed herein relate generally to debugging complex computing systems, such as those executing neural networks. A neural network processor includes a processing engine configured to execute instructions to implement multiple layers of a neural network. The neural network processor includes a debugging circuit configured to generate error detection codes for input data to the processing engine or error detection codes for output data generated by the processing engine. The neural network processor also includes an interface to a memory device, where the interface is configured to save the error detection codes generated by the debugging circuit into the memory device. The error detection codes generated by the debugging circuit are compared with expected error detection codes generated using a function model of the neural network to identify defects of the neural network.

150.

发明授权
Top value computation on an integrated circuit device 有权

公开(公告)号：US11188302B1

公开(公告)日：2021-11-30

申请号：US16267031

申请日：2019-02-04

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang , Richard John Heaton

IPC: G06F7/24 , G06N3/063 , G06F9/50 , G06F9/48

Abstract: Top-k is a process by which the largest elements among a set of elements is found. In various implementations, a top-k computation can be executed by a neural network accelerator, where the top-k computation is performed using a process that makes use of the accelerators memory array. A set of numerical values on which to perform top-k can be stored in the memory array. The accelerator can locate the maximum value from among the set of numerical values, and can store the maximum value back into the memory array. The accelerator can next remove the maximum value from the set of numerical values, so that a next largest value can be found. To remove the maximum value, the accelerator can write a value representing negative infinity to the memory array at each location of the maximum value.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification