Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Ilya Minkin"

1.

发明授权
Dynamic code loading for multiple executions on a sequential processor 有权

公开(公告)号：US11809953B1

公开(公告)日：2023-11-07

申请号：US17902702

申请日：2022-09-02

Applicant: Amazon Technologies, Inc.

Inventor： Samuel Jacob , Ilya Minkin , Mohammad El-Shabani

IPC: G06F1/00 , G06N3/063 , G06N5/04

CPC classification number: G06N3/063 , G06N5/04

Abstract: Embodiments include techniques for enabling execution of N inferences on an execution engine of a neural network device. Instruction code for a single inference is stored in a memory that is accessible by a DMA engine, the instruction code forming a regular code block. A NOP code block and a reset code block for resetting an instruction DMA queue are stored in the memory. The instruction DMA queue is generated such that, when it is executed by the DMA engine, it causes the DMA engine to copy, for each of N inferences, both the regular code block and an additional code block to an instruction buffer. The additional code block is the NOP code block for the first N−1 inferences and is the reset code block for the Nth inference. When the reset code block is executed by the execution engine, the instruction DMA queue is reset.

2.

发明授权
Tensorized direct memory access descriptors 有权

公开(公告)号：US11550736B1

公开(公告)日：2023-01-10

申请号：US17449581

申请日：2021-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam

IPC: G06F13/16 , G06N3/04 , G06F13/30

Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.

3.

发明授权
Low latency neural network model loading 有权

公开(公告)号：US11182314B1

公开(公告)日：2021-11-23

申请号：US16698761

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Drazen Borkovic , Ilya Minkin , Vignesh Vivekraja , Richard John Heaton , Randy Renfu Huang

IPC: G06F13/20 , G06F13/10 , G06N3/04

Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.

4.

发明授权
Direct memory access operation for neural network accelerator 有权

公开(公告)号：US11868872B1

公开(公告)日：2024-01-09

申请号：US16836493

申请日：2020-03-31

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Kun Xu

IPC: G06N3/063 , G06F12/02 , G06F12/1081 , G06F13/16 , G06F13/28 , G11C15/04 , G06N3/045

CPC classification number: G06N3/063 , G06F12/0292 , G06F12/1081 , G06F13/1605 , G06F13/28 , G06N3/045 , G11C15/04 , G06F2212/152 , G06F2213/2802

Abstract: In one example, an apparatus comprises: a direct memory access (DMA) descriptor queue that stores DMA descriptors, each DMA descriptor including an indirect address; an address translation table that stores an address mapping between indirect addresses and physical addresses; and a DMA engine configured to: fetch a DMA descriptor from the DMA descriptor queue to the address translation table to translate a first indirect address of the DMA descriptor to a first physical address based on the address mapping, and perform a DMA operation based on executing the DMA descriptor to transfer data to or from the first physical address.

5.

发明授权
Profiling and debugging for remote neural network execution 有权

公开(公告)号：US11531578B1

公开(公告)日：2022-12-20

申请号：US16216887

申请日：2018-12-11

Applicant: Amazon Technologies, Inc.

Inventor： Richard John Heaton , Ilya Minkin

IPC: G06F11/07 , G06N3/02 , G06F9/54 , G06N5/04 , G06N3/04

Abstract: Remote access for debugging or profiling a remotely executing neural network graph can be performed by a client using an in-band application programming interface (API). The client can provide indicator flags for debugging or profiling in an inference request sent to a remote server computer executing the neural network graph using the API. The remote server computer can collect metadata for debugging or profiling during the inference operation using the neural network graph and send it back to the client using the same API. Additionally, the metadata can be collected at various granularity levels also specified in the inference request.

6.

发明授权
Strong ordered transaction for DMA transfers 有权

公开(公告)号：US12204757B1

公开(公告)日：2025-01-21

申请号：US18067514

申请日：2022-12-16

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Raymond S. Whiteside

IPC: G06F3/06

Abstract: A technique for processing strong ordered transactions in a direct memory access engine may include retrieving a memory descriptor to perform a strong ordered transaction, and delaying the strong ordered transaction until pending write transactions associated with previous memory descriptors retrieved prior to the memory descriptor are complete. Subsequent transactions associated with memory descriptors following the memory descriptor are allowed to be issued while waiting for the pending write transactions to complete. Upon completion of the pending write transactions, the strong ordered transaction is performed.

7.

发明授权
Synchronization of concurrent computation engines 有权

公开(公告)号：US11175919B1

公开(公告)日：2021-11-16

申请号：US16219610

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Drazen Borkovic , Jindrich Zejda , Dana Michelle Vantrease

IPC: G06F9/30 , G06F9/35 , G06F13/28 , G06F9/38 , G06F9/52 , G06N3/06

Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

8.

发明授权
Non-intrusive hardware profiling 有权

公开(公告)号：US11119787B1

公开(公告)日：2021-09-14

申请号：US16368263

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Mohammad El-Shabani , Ron Diamant , Samuel Jacob , Ilya Minkin , Richard John Heaton

IPC: G06F9/44 , G06F8/41 , G06F11/30 , G06F9/38 , G06F11/22 , G06F9/455 , G06F11/36 , G06F9/445 , G06F11/34 , G06F9/30

Abstract: Systems and methods for non-intrusive hardware profiling are provided. In some cases integrated circuit devices can be manufactured without native support for performance measurement and/or debugging capabilities, thereby limiting visibility into the integrated circuit device. Understanding the timing of operations can help to determine whether the hardware of the device is operating correctly and, when the device is not operating correctly, provide information that can be used to debug the device. In order to measure execution time of various tasks performed by the integrated circuit device, program instructions may be inserted to generate notifications that provide tracing information, including timestamps, for operations executed by the integrated circuit device.

9.

发明授权
Synchronization of concurrent computation engines 有权

公开(公告)号：US10922146B1

公开(公告)日：2021-02-16

申请号：US16219530

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Drazen Borkovic , Jindrich Zejda , Dana Michelle Vantrease

IPC: G06F9/46 , G06F9/52 , G06F8/41 , G06F9/30 , G06N3/063

Abstract: Systems and methods are provided for synchronizing execution of program code for an integrated circuit device having multiple concurrently operating execution engines, where the operation of one execution engine may be dependent on the operation of another execution engine. Data or resource dependencies may be accommodated with a Set instruction to cause a first execution engine to set a register value and a Wait instruction to cause a second execution engine to wait for a condition associate with the register value. Concurrently operation of the execution engines may thus be synchronized.

10.

发明授权
Multidimensional and multiblock tensorized direct memory access descriptors 有权

公开(公告)号：US11983128B1

公开(公告)日：2024-05-14

申请号：US18067109

申请日：2022-12-16

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam

IPC: G06F13/30 , G06F13/16

CPC classification number: G06F13/30 , G06F13/1621 , G06F13/1642

Abstract: Techniques to reduce overhead in a direct memory access (DMA) engine can include processing descriptors from a descriptor queue to obtain a striding configuration to generate tensorized memory descriptors. The striding configuration can include, for each striding dimension, a stride and a repetition number indicating a number of times to repeat striding in the corresponding striding dimension. One or more sets of tensorized memory descriptors can be generated based on the striding configuration. Data transfers are then performed based on the generated tensorized memory descriptors.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification