Dynamic code loading for multiple executions on a sequential processor

    公开(公告)号:US11809953B1

    公开(公告)日:2023-11-07

    申请号:US17902702

    申请日:2022-09-02

    CPC classification number: G06N3/063 G06N5/04

    Abstract: Embodiments include techniques for enabling execution of N inferences on an execution engine of a neural network device. Instruction code for a single inference is stored in a memory that is accessible by a DMA engine, the instruction code forming a regular code block. A NOP code block and a reset code block for resetting an instruction DMA queue are stored in the memory. The instruction DMA queue is generated such that, when it is executed by the DMA engine, it causes the DMA engine to copy, for each of N inferences, both the regular code block and an additional code block to an instruction buffer. The additional code block is the NOP code block for the first N−1 inferences and is the reset code block for the Nth inference. When the reset code block is executed by the execution engine, the instruction DMA queue is reset.

    Low latency neural network model loading

    公开(公告)号:US11182314B1

    公开(公告)日:2021-11-23

    申请号:US16698761

    申请日:2019-11-27

    Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.

    Profiling and debugging for remote neural network execution

    公开(公告)号:US11531578B1

    公开(公告)日:2022-12-20

    申请号:US16216887

    申请日:2018-12-11

    Abstract: Remote access for debugging or profiling a remotely executing neural network graph can be performed by a client using an in-band application programming interface (API). The client can provide indicator flags for debugging or profiling in an inference request sent to a remote server computer executing the neural network graph using the API. The remote server computer can collect metadata for debugging or profiling during the inference operation using the neural network graph and send it back to the client using the same API. Additionally, the metadata can be collected at various granularity levels also specified in the inference request.

    Strong ordered transaction for DMA transfers

    公开(公告)号:US12204757B1

    公开(公告)日:2025-01-21

    申请号:US18067514

    申请日:2022-12-16

    Abstract: A technique for processing strong ordered transactions in a direct memory access engine may include retrieving a memory descriptor to perform a strong ordered transaction, and delaying the strong ordered transaction until pending write transactions associated with previous memory descriptors retrieved prior to the memory descriptor are complete. Subsequent transactions associated with memory descriptors following the memory descriptor are allowed to be issued while waiting for the pending write transactions to complete. Upon completion of the pending write transactions, the strong ordered transaction is performed.

Patent Agency Ranking