-
公开(公告)号:US11586907B2
公开(公告)日:2023-02-21
申请号:US16280960
申请日:2019-02-20
摘要: Embodiments of a device include an integrated circuit, a reconfigurable stream switch formed in the integrated circuit, and an arithmetic unit coupled to the reconfigurable stream switch. The arithmetic unit has a plurality of inputs and at least one output, and the arithmetic unit is solely dedicated to performance of a plurality of parallel operations. Each one of the plurality of parallel operations carries out a portion of the formula: output=AX+BY+C.
-
公开(公告)号:US11442700B2
公开(公告)日:2022-09-13
申请号:US16833340
申请日:2020-03-27
摘要: A system includes an addressable memory array, one or more processing cores, and an accelerator framework coupled to the addressable memory. The accelerator framework includes a Multiply ACcumulate (MAC) hardware accelerator cluster. The MAC hardware accelerator cluster has a binary-to-residual converter, which, in operation, converts binary inputs to a residual number system. Converting a binary input to the residual number system includes a reduction modulo 2m and a reduction modulo 2m−1, where m is a positive integer. A plurality of MAC hardware accelerators perform modulo 2m multiply-and-accumulate operations and modulo 2m−1 multiply-and-accumulate operations using the converted binary input. A residual-to-binary converter generates a binary output based on the output of the MAC hardware accelerators.
-
公开(公告)号:US11227086B2
公开(公告)日:2022-01-18
申请号:US15931445
申请日:2020-05-13
发明人: Thomas Boesch , Giuseppe Desoli
IPC分类号: G02B6/35 , G06F30/327 , G06N20/10 , G06N3/04 , G06N3/08 , G06F30/34 , G06N20/00 , G06N7/00 , G06F115/08 , G06N3/063 , G06F9/445 , G06F13/40 , G06F15/78
摘要: A system on a chip (SoC) includes a plurality of processing cores and a stream switch coupled to two or more of the plurality of processing cores. The stream switch includes a plurality of N multibit input ports, wherein N is a first integer, a plurality of M multibit output ports, wherein M is a second integer, and a plurality of M multibit stream links dedicated to respective output ports of the plurality of M multibit output ports. The M multibit stream links are reconfigurably coupleable at run time to a selectable number of the N multibit input ports, wherein the selectable number is an integer between zero and N.
-
公开(公告)号:US12073308B2
公开(公告)日:2024-08-27
申请号:US15423279
申请日:2017-02-02
发明人: Thomas Boesch , Giuseppe Desoli
IPC分类号: G06N3/063 , G06F30/327 , G06F30/34 , G06F30/347 , G06N3/044 , G06N3/045 , G06N3/0464 , G06N3/047 , G06N3/084 , G06N20/00 , G06N20/10 , G06F9/445 , G06F13/40 , G06F15/78 , G06F115/02 , G06F115/08 , G06N3/04 , G06N3/08 , G06N7/01
CPC分类号: G06N3/0464 , G06F30/327 , G06F30/34 , G06F30/347 , G06N3/044 , G06N3/045 , G06N3/047 , G06N3/084 , G06N20/00 , G06N20/10 , G06F9/44505 , G06F13/4022 , G06F15/7817 , G06F2115/02 , G06F2115/08 , G06N3/04 , G06N3/063 , G06N3/08 , G06N7/01
摘要: Embodiments are directed towards a hardware accelerator engine that supports efficient mapping of convolutional stages of deep neural network algorithms. The hardware accelerator engine includes a plurality of convolution accelerators, and each one of the plurality of convolution accelerators includes a kernel buffer, a feature line buffer, and a plurality of multiply-accumulate (MAC) units. The MAC units are arranged to multiply and accumulate data received from both the kernel buffer and the feature line buffer. The hardware accelerator engine also includes at least one input bus coupled to an output bus port of a stream switch, at least one output bus coupled to an input bus port of the stream switch, or at least one input bus and at least one output bus hard wired to respective output bus and input bus ports of the stream switch.
-
公开(公告)号:US11900240B2
公开(公告)日:2024-02-13
申请号:US17023144
申请日:2020-09-16
IPC分类号: G06N3/06 , G06F1/32 , G06F9/50 , G06F1/08 , G06N3/063 , G06N3/082 , G06F1/3228 , G06F1/324 , G06F1/3296
CPC分类号: G06N3/063 , G06F1/08 , G06F1/324 , G06F1/3228 , G06F1/3296 , G06F9/5027 , G06N3/082
摘要: Systems and devices are provided to increase computational and/or power efficiency for one or more neural networks via a computationally driven closed-loop dynamic clock control. A clock frequency control word is generated based on information indicative of a current frame execution rate of a processing task of the neural network and a reference clock signal. A clock generator generates the clock signal of neural network based on the clock frequency control word. A reference frequency may be used to generate the clock frequency control word, and the reference frequency may be based on information indicative of a sparsity of data of a training frame.
-
公开(公告)号:US11687762B2
公开(公告)日:2023-06-27
申请号:US16280991
申请日:2019-02-20
摘要: Embodiments of a device include an integrated circuit, a reconfigurable stream switch formed in the integrated circuit along with a plurality of convolution accelerators and an arithmetic unit coupled to the reconfigurable stream switch. The arithmetic unit has at least one input and at least one output. The at least one input is arranged to receive streaming data passed through the reconfigurable stream switch, and the at least one output is arranged to stream resultant data through the reconfigurable stream switch. The arithmetic unit also has a plurality of data paths. At least one of the plurality of data paths is solely dedicated to performance of operations that accelerate an activation function represented in the form of a piece-wise second order polynomial approximation.
-
公开(公告)号:US11593609B2
公开(公告)日:2023-02-28
申请号:US16794062
申请日:2020-02-18
摘要: Embodiments of an electronic device include an integrated circuit, a reconfigurable stream switch formed in the integrated circuit along with a plurality of convolution accelerators and a decompression unit coupled to the reconfigurable stream switch. The decompression unit decompresses encoded kernel data in real time during operation of convolutional neural network.
-
公开(公告)号:US10402527B2
公开(公告)日:2019-09-03
申请号:US15423289
申请日:2017-02-02
发明人: Thomas Boesch , Giuseppe Desoli
摘要: Embodiments are directed towards a reconfigurable stream switch formed in an integrated circuit. The stream switch includes a plurality of output ports, a plurality of input ports, and a plurality of selection circuits. The output ports each have an output port architectural composition, and each is arranged to unidirectionally pass output data and output control information. The input ports each have an input port architectural composition, and each is arranged to unidirectionally receive first input data and first input control information. Each one of the selection circuits is coupled to an associated one of the output ports. Each selection circuit is further coupled to all of the input ports such that each selection circuit is arranged to reconfigurably couple its associated output port to no more than one input port at any given time.
-
公开(公告)号:US12118451B2
公开(公告)日:2024-10-15
申请号:US15423272
申请日:2017-02-02
发明人: Giuseppe Desoli , Thomas Boesch , Nitin Chawla , Surinder Pal Singh , Elio Guidetti , Fabio Giuseppe De Ambroggi , Tommaso Majo , Paolo Sergio Zambotti
IPC分类号: G06N3/04 , G06F30/327 , G06F30/34 , G06F30/347 , G06N3/044 , G06N3/045 , G06N3/0464 , G06N3/047 , G06N3/084 , G06N20/00 , G06N20/10 , G06F9/445 , G06F13/40 , G06F15/78 , G06F115/02 , G06F115/08 , G06N3/063 , G06N3/08 , G06N7/01
CPC分类号: G06N3/0464 , G06F30/327 , G06F30/34 , G06F30/347 , G06N3/044 , G06N3/045 , G06N3/047 , G06N3/084 , G06N20/00 , G06N20/10 , G06F9/44505 , G06F13/4022 , G06F15/7817 , G06F2115/02 , G06F2115/08 , G06N3/04 , G06N3/063 , G06N3/08 , G06N7/01
摘要: Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture. The SoC includes a system bus, a plurality of addressable memory arrays coupled to the system bus, at least one applications processor core coupled to the system bus, and a configurable accelerator framework coupled to the system bus. The configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system. The SoC also includes a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs coordinate functionality with the configurable accelerator framework to execute the DCNN.
-
公开(公告)号:US11836608B2
公开(公告)日:2023-12-05
申请号:US18056937
申请日:2022-11-18
CPC分类号: G06N3/063 , G06F9/5027 , H03M7/3082 , H03M7/6005
摘要: Techniques and systems are provided for implementing a convolutional neural network. One or more convolution accelerators are provided that each include a feature line buffer memory, a kernel buffer memory, and a plurality of multiply-accumulate (MAC) circuits arranged to multiply and accumulate data. In a first operational mode the convolutional accelerator stores feature data in the feature line buffer memory and stores kernel data in the kernel data buffer memory. In a second mode of operation, the convolutional accelerator stores kernel decompression tables in the feature line buffer memory.
-
-
-
-
-
-
-
-
-