Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Ron Diamant"

51.

发明授权
Neural network processing based on subgraph recognition 有权

公开(公告)号：US11714992B1

公开(公告)日：2023-08-01

申请号：US16219760

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Richard John Heaton , Randy Renfu Huang , Ron Diamant

IPC: G06F16/00 , G06N3/04 , G06F9/30 , G06F16/901 , G06F9/48

CPC classification number: G06N3/04 , G06F9/4881 , G06F9/30003 , G06F16/9024

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.

52.

发明授权
Using shared data bus to support systolic array tiling 有权

公开(公告)号：US11625453B1

公开(公告)日：2023-04-11

申请号：US16712699

申请日：2019-12-12

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Ron Diamant

IPC: G06F17/16 , G06F13/40 , G06F15/80

Abstract: To improve utilization of a systolic array, each row of the array is provided with a number of general purpose row input data buses. Each of the general purpose row input data buses can be operable to transfer either feature map (FMAP) input elements or weight values into the processing elements of the corresponding row of the array. By using such general purpose row input data buses, concurrent matrix multiplications as well as faster background weight loading can be achieved in the array.

53.

发明授权
Tensorized direct memory access descriptors 有权

公开(公告)号：US11550736B1

公开(公告)日：2023-01-10

申请号：US17449581

申请日：2021-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam

IPC: G06F13/16 , G06N3/04 , G06F13/30

Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.

54.

发明申请
SYSTOLIC ARRAY WITH EFFICIENT INPUT REDUCTION AND EXTENDED ARRAY PERFORMANCE 有权

公开(公告)号：US20230004384A1

公开(公告)日：2023-01-05

申请号：US17363894

申请日：2021-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer

IPC: G06F9/30 , G06F15/80

Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.

55.

发明授权
Hardware security accelerator 有权

公开(公告)号：US11483296B1

公开(公告)日：2022-10-25

申请号：US16917367

申请日：2020-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Nafea Bshara , Leah Shalev , Erez Izenberg

IPC: H04L9/40 , H04L9/32 , H04L9/08 , H04L9/14 , G09C1/00

Abstract: A hardware security accelerator includes a configurable parser that is configured to receive a packet and to extract from the packet headers associated with a set of protocols. The security accelerator also includes a packet type detection unit to determine a type of the packet in response to the set of protocols and to generate a packet type identifier indicative of the type of the packet. A configurable security unit includes a configuration unit and a configurable security engine. The configuration unit configures the configurable security engine according to the type of the packet and to content of at least one of the headers extracted from the packet. The configurable security engine performs security processing of the packet to provide at least one security result.

56.

发明授权
Synchronizing operations in hardware accelerator 有权

公开(公告)号：US11468304B1

公开(公告)日：2022-10-11

申请号：US16696377

申请日：2019-11-26

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant

IPC: G06N3/063 , G06F17/15 , G06F15/80

Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.

57.

发明授权
Fine-grained access memory controller 有权

公开(公告)号：US11467973B1

公开(公告)日：2022-10-11

申请号：US16146332

申请日：2018-09-28

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Sundeep Amirineni

IPC: G06F12/00 , G06F12/10 , G06F12/1018 , G06F12/1027 , G06F12/1081 , G06F12/1045 , G06F12/1009 , G06F12/1036 , G06F12/1072

Abstract: Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.

58.

发明授权
Dynamically configurable pipeline 有权

公开(公告)号：US11294841B1

公开(公告)日：2022-04-05

申请号：US16985056

申请日：2020-08-04

Applicant: Amazon Technologies, Inc.

Inventor： Adiel Sarusi , Ron Diamant , Ori Weber , Erez Izenberg

IPC: G06F13/40 , G06F13/16 , G06F13/42

Abstract: Techniques disclosed herein relate to dynamically configurable multi-stage pipeline processing units. In one embodiment, a circuit includes a plurality of processing engines and a plurality of switches. Each of the plurality of processing engines includes an input port and an output port. Each of the plurality of switches comprises two input ports and two output ports. For each processing engine, the input port of the processing engine is electrically coupled to one of the switches, the output port of the processing engine is electrically coupled to another one of the switches, and the input port of the processing engine is electrically coupled to the output port of each of the processing engines by the switches.

59.

发明授权
Registers for restricted memory 有权

公开(公告)号：US11294599B1

公开(公告)日：2022-04-05

申请号：US16891438

申请日：2020-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh

IPC: G06F15/76 , G06F3/06 , G06N3/02 , G06F13/28

Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

60.

发明授权
Flexible weight expansion 有权

公开(公告)号：US11263517B1

公开(公告)日：2022-03-01

申请号：US15908080

申请日：2018-02-28

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Huang

IPC: G06N3/04 , G06N3/063 , G06N3/02 , G06N3/06 , G06N20/00

Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include an arithmetic circuit configured to perform arithmetic operations for a neural network. The integrated circuit may also include a weight processing circuit configured to: acquire data from a memory device; receive configuration information indicating a size of each quantized weight of a set of quantized weights; extract the set of quantized weights from the data based on the size of the each weight indicated by the configuration information; perform de-quantization processing on the set of quantized weights to generate a set of de-quantized weights; and provide the set of de-quantized weights to the arithmetic circuit to enable the arithmetic circuit to perform the arithmetic operations. The memory device may be part of or external to the integrated circuit.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification