DATA VOLUME SCULPTOR FOR DEEP LEARNING ACCELERATION

    公开(公告)号:US20190266784A1

    公开(公告)日:2019-08-29

    申请号:US16280963

    申请日:2019-02-20

    Abstract: Embodiments of a device include on-board memory, an applications processor, a digital signal processor (DSP) cluster, a configurable accelerator framework (CAF), and at least one communication bus architecture. The communication bus communicatively couples the applications processor, the DSP cluster, and the CAF to the on-board memory. The CAF includes a reconfigurable stream switch and a data volume sculpting unit, which has an input and an output coupled to the reconfigurable stream switch. The data volume sculpting unit has a counter, a comparator, and a controller. The data volume sculpting unit is arranged to receive a stream of feature map data that forms a three-dimensional (3D) feature map. The 3D feature map is formed as a plurality of two-dimensional (2D) data planes. The data volume sculpting unit is also arranged to identify a 3D volume within the 3D feature map that is dimensionally smaller than the 3D feature map and isolate data from the 3D feature map that is within the 3D volume for processing in a deep learning algorithm.

    PROGRAMMABLE HARDWARE ACCELERATOR CONTROLLER
    12.
    发明公开

    公开(公告)号:US20240220278A1

    公开(公告)日:2024-07-04

    申请号:US18176323

    申请日:2023-02-28

    CPC classification number: G06F9/44505

    Abstract: A system includes a host processor, a memory, a hardware accelerator and a configuration controller. The host processor, in operation, controls execution of a multi-stage processing task. The memory, in operation, stores data and configuration information. The hardware accelerator, in operation preforms operations associated with stages of the multi-stage processing task. The configuration controller is coupled to the host processor, the hardware accelerator, and the memory. The configuration controller executes a linked list of configuration operations, for example, under control of a finite state machine. The linked list consists of configuration operations selected from a defined set of configuration operations. Executing the linked list of configuration operations configures the plurality of configuration registers of the hardware accelerator to control operations of the hardware accelerator associated with a stage of the multi-stage processing task. The configuration controller may retrieve the linked list from the memory via a high-speed data bus.

    RECONFIGURABLE INTERCONNECT
    15.
    发明申请

    公开(公告)号:US20190340314A1

    公开(公告)日:2019-11-07

    申请号:US16517371

    申请日:2019-07-19

    Abstract: A system on a chip (SoC) includes a plurality of processing cores and a stream switch coupled to two or more of the plurality of processing cores. The stream switch includes a plurality of N multibit input ports, wherein N is a first integer. a plurality of M multibit output ports, wherein M is a second integer, and a plurality of M multibit stream links dedicated to respective output ports of the plurality of M multibit output ports. The M multibit stream links are reconfigurably coupleable at run time to a selectable number of the N multibit input ports, wherein the selectable number is an integer between zero and N.

    TOOL TO CREATE A RECONFIGURABLE INTERCONNECT FRAMEWORK

    公开(公告)号:US20180189424A1

    公开(公告)日:2018-07-05

    申请号:US15423292

    申请日:2017-02-02

    Abstract: Embodiments are directed towards a method to create a reconfigurable interconnect framework in an integrated circuit. The method includes accessing a configuration template directed toward the reconfigurable interconnect framework, editing parameters of the configuration template, functionally combining the configuration template with a plurality of modules from an IP library to produce a register transfer level (RTL) circuit model, generating at least one automated test-bench function, and generating at least one logic synthesis script. Editing parameters of the configuration template includes confirming a first number of output ports of a reconfigurable stream switch and confirming a second number of input ports of the reconfigurable stream switch. Each output port and each input port has a respective architectural composition. The output port architectural composition is defined by a plurality of N data paths including A data outputs and B control outputs. The input port architectural composition is defined by a plurality of M data paths including A data inputs and B control inputs.

    ACCELERATION OF 1X1 CONVOLUTIONS IN CONVOLUTIONAL NEURAL NETWORKS

    公开(公告)号:US20230418559A1

    公开(公告)日:2023-12-28

    申请号:US17847817

    申请日:2022-06-23

    CPC classification number: G06F7/523 G06F7/50

    Abstract: A convolutional accelerator includes a feature line buffer, a kernel buffer, a multiply-accumulate cluster, and mode control circuitry. In a first mode of operation, the mode control circuitry stores feature data in a feature line buffer and stores kernel data in a kernel buffer. The data stored in the buffers is transferred to the MAC cluster of the convolutional accelerator for processing. In a second mode of operation the mode control circuitry stores feature data in the kernel buffer and stores kernel data in the feature line buffer. The data stored in the buffers is transferred to the MAC cluster of the convolutional accelerator for processing. The second mode of operation may be employed to efficiently process 1×N kernels, where N is an integer greater than or equal to 1.

Patent Agency Ranking