HARDWARE ACCELERATOR METHOD, SYSTEM AND DEVICE

    公开(公告)号:US20200310761A1

    公开(公告)日:2020-10-01

    申请号:US16833340

    申请日:2020-03-27

    Abstract: A system includes an addressable memory array, one or more processing cores, and an accelerator framework coupled to the addressable memory. The accelerator framework includes a Multiply ACcumulate (MAC) hardware accelerator cluster. The MAC hardware accelerator cluster has a binary-to-residual converter, which, in operation, converts binary inputs to a residual number system. Converting a binary input to the residual number system includes a reduction modulo 2m and a reduction modulo 2m−1, where m is a positive integer. A plurality of MAC hardware accelerators perform modulo 2m multiply-and-accumulate operations and modulo 2m−1 multiply-and-accumulate operations using the converted binary input. A residual-to-binary converter generates a binary output based on the output of the MAC hardware accelerators.

    ACCELERATION UNIT FOR A DEEP LEARNING ENGINE
    33.
    发明申请

    公开(公告)号:US20190266479A1

    公开(公告)日:2019-08-29

    申请号:US16280991

    申请日:2019-02-20

    Abstract: Embodiments of a device include an integrated circuit, a reconfigurable stream switch formed in the integrated circuit along with a plurality of convolution accelerators and an arithmetic unit coupled to the reconfigurable stream switch. The arithmetic unit has at least one input and at least one output. The at least one input is arranged to receive streaming data passed through the reconfigurable stream switch, and the at least one output is arranged to stream resultant data through the reconfigurable stream switch. The arithmetic unit also has a plurality of data paths. At least one of the plurality of data paths is solely dedicated to performance of operations that accelerate an activation function represented in the form of a piece-wise second order polynomial approximation.

    HARDWARE ACCELERATOR ENGINE
    34.
    发明申请

    公开(公告)号:US20180189641A1

    公开(公告)日:2018-07-05

    申请号:US15423279

    申请日:2017-02-02

    Abstract: Embodiments are directed towards a hardware accelerator engine that supports efficient mapping of convolutional stages of deep neural network algorithms. The hardware accelerator engine includes a plurality of convolution accelerators, and each one of the plurality of convolution accelerators includes a kernel buffer, a feature line buffer, and a plurality of multiply-accumulate (MAC) units. The MAC units are arranged to multiply and accumulate data received from both the kernel buffer and the feature line buffer. The hardware accelerator engine also includes at least one input bus coupled to an output bus port of a stream switch, at least one output bus coupled to an input bus port of the stream switch, or at least one input bus and at least one output bus hard wired to respective output bus and input bus ports of the stream switch.

    CONFIGURABLE STREAM SWITCH WITH VIRTUAL CHANNELS FOR THE SHARING OF I/O PORTS IN STREAM-BASED ARCHITECTURES

    公开(公告)号:US20240354269A1

    公开(公告)日:2024-10-24

    申请号:US18304938

    申请日:2023-04-21

    CPC classification number: G06F13/374 G06F9/5077 G06F2209/5011

    Abstract: A stream switch includes a data router, configuration registers, and arbitration logic. The data router has a plurality of input ports, each having a plurality of associated virtual input channels, and a plurality of output ports, each having a plurality of associated virtual output channels. The data router transmits data streams from input ports to one or more output ports of the plurality of output ports. The configuration registers store configuration data associated with the virtual output channels of the respective output ports of the plurality of output ports. The stored configuration data identifies a source input port and virtual input channel ID associated with the virtual output channel of the output port. The arbitration logic allocates bandwidth of the data router based on request signals associated with virtual input channels of the input ports and the configuration data associated with the virtual output channels.

Patent Agency Ranking