-
1.
公开(公告)号:US20240281646A1
公开(公告)日:2024-08-22
申请号:US18192629
申请日:2023-03-29
Applicant: STMicroelectronics International N.V.
Inventor: Michele ROSSI , Giuseppe DESOLI , Thomas BOESCH
IPC: G06N3/063 , G06F17/15 , G06F17/16 , G06N3/0464
CPC classification number: G06N3/063 , G06F17/153 , G06F17/16 , G06N3/0464
Abstract: A hardware accelerator includes a plurality of functional circuits, a stream switch, and a plurality of stream engines. The stream engines are coupled to the functional circuits via the stream switch, and in operation, generate data streaming requests to stream data to and from the functional circuits. The functional circuits include at least one convolutional cluster, which includes a plurality of processing elements coupled together via a reconfigurable crossbar switch. The reconfigurable crossbar switch is coupled to the stream switch, and in operation, streams data to, from, and between processing elements of the processing cluster.
-
公开(公告)号:US20230418559A1
公开(公告)日:2023-12-28
申请号:US17847817
申请日:2022-06-23
Inventor: Michele ROSSI , Thomas BOESCH , Giuseppe DESOLI
Abstract: A convolutional accelerator includes a feature line buffer, a kernel buffer, a multiply-accumulate cluster, and mode control circuitry. In a first mode of operation, the mode control circuitry stores feature data in a feature line buffer and stores kernel data in a kernel buffer. The data stored in the buffers is transferred to the MAC cluster of the convolutional accelerator for processing. In a second mode of operation the mode control circuitry stores feature data in the kernel buffer and stores kernel data in the feature line buffer. The data stored in the buffers is transferred to the MAC cluster of the convolutional accelerator for processing. The second mode of operation may be employed to efficiently process 1×N kernels, where N is an integer greater than or equal to 1.
-
公开(公告)号:US20240330660A1
公开(公告)日:2024-10-03
申请号:US18426128
申请日:2024-01-29
Applicant: STMicroelectronics International N.V.
Inventor: Carmine CAPPETTA , Surinder Pal SINGH , Giuseppe DESOLI , Thomas BOESCH , Michele ROSSI
IPC: G06N3/0464 , G06N3/063
CPC classification number: G06N3/0464 , G06N3/063
Abstract: A neural network includes an internal storage unit. The internal storage unit stores feature data received from a memory external to the neural network. The internal storage unit reads the feature data to a hardware accelerator of the neural network. The internal storage unit adapts a storage pattern of the feature data and a read pattern of the feature data to enhance the efficiency of the hardware accelerator.
-
4.
公开(公告)号:US20240281397A1
公开(公告)日:2024-08-22
申请号:US18192631
申请日:2023-03-29
Applicant: STMicroelectronics International N.V.
Inventor: Michele ROSSI , Giuseppe DESOLI , Thomas BOESCH
CPC classification number: G06F13/4022 , G06F13/1668
Abstract: A hardware accelerator includes processing elements of a neural network, each processing element having a memory; a stream switch; stream engines coupled to functional circuits via the stream switch, wherein the stream engines, in operation, generate data streaming requests to stream data to and from functional circuits of the plurality of functional circuits; a first system bus interface coupled to the stream engines; a second system bus interface coupled to the processing elements; and mode control circuitry, which, in operation, sets respective modes of operation for the plurality of processing elements. The modes of operation include: a compute mode of operation in which the processing element performs computing operations using the memory associated with the processing element; and a memory mode of operation in which the memory associated with the processing element performs memory operations, bypassing the stream switch, via the second system bus interface.
-
公开(公告)号:US20200310761A1
公开(公告)日:2020-10-01
申请号:US16833340
申请日:2020-03-27
Inventor: Michele ROSSI , Giuseppe DESOLI , Thomas BOESCH , Carmine CAPPETTA
Abstract: A system includes an addressable memory array, one or more processing cores, and an accelerator framework coupled to the addressable memory. The accelerator framework includes a Multiply ACcumulate (MAC) hardware accelerator cluster. The MAC hardware accelerator cluster has a binary-to-residual converter, which, in operation, converts binary inputs to a residual number system. Converting a binary input to the residual number system includes a reduction modulo 2m and a reduction modulo 2m−1, where m is a positive integer. A plurality of MAC hardware accelerators perform modulo 2m multiply-and-accumulate operations and modulo 2m−1 multiply-and-accumulate operations using the converted binary input. A residual-to-binary converter generates a binary output based on the output of the MAC hardware accelerators.
-
-
-
-