-
1.
公开(公告)号:US20210303267A1
公开(公告)日:2021-09-30
申请号:US17203591
申请日:2021-03-16
Applicant: STMICROELECTRONICS S.r.l.
Inventor: Xiao Kang JIAO , Fabio Giuseppe DE AMBROGGI , Loris LUISE
Abstract: A method includes retrieving a plurality of datasets from respective memory registers of a memory and storing the retrieved plurality of datasets in respective register portions of a first register. A dataset of data-processing coefficients are stored in a second register. First processing is applied using, as the first operand, a first sub-set of dataset elements stored in the first register, and using, as the second operand, the data-processing coefficients, obtaining a first result. Second processing is applied using, as the first operand, a second sub-set of dataset elements stored in the first register comprised in a second window having a size equal to the dataset size, and using, as the second operand, the replica of the dataset of data-processing coefficients, obtaining a second result. An output is generated based on the first and second results. The first and second processing may perform multiply accumulate (MAC) operations.
-
公开(公告)号:US20190147338A1
公开(公告)日:2019-05-16
申请号:US16189264
申请日:2018-11-13
Applicant: STMICROELECTRONICS S.r.l.
Inventor: Danilo Pietro PAU , Emanuele PLEBANI , Fabio Giuseppe DE AMBROGGI , Floriana GUIDO , Angelo BOSCO
Abstract: A neural network classifies an input signal. For example, an accelerometer signal may be classified to detect human activity. In a first convolutional layer, two-valued weights are applied to the input signal. In a first two-valued function layer coupled at input to an output of the first convolutional layer, a two-valued function is applied. In a second convolutional layer coupled at input to an output of the first two-valued functional layer, weights of the second convolutional layer are applied. In a fully-connected layer coupled at input to an output of the second convolutional layer, two-valued weights of the fully connected layer are applied. In a second two-valued function layer coupled at input to an output of the fully connected layer, a two-valued function of the second two-valued function layer is applied. A classifier classifies the input signal based on an output signal of second two-valued function layer.
-
公开(公告)号:US20220414420A1
公开(公告)日:2022-12-29
申请号:US17360986
申请日:2021-06-28
Inventor: Loris LUISE , Surinder Pal SINGH , Fabio Giuseppe DE AMBROGGI
Abstract: Data structure and microcontroller architecture performing binary multiply-accumulate operations using multiple partial copies of weights. Destination-register location, source-register location, and weight-register location are received. Using the weight-register location, a sub-set of the weight bits is copied a select number of times based on a filter index value that is received. Each copy of the sub-set of weights is executed in parallel. Using the source-register location, a sub-set of the input bits is selected based on the size of the sub-set of weights, wherein the sub-set of input bits is shifted one bit from a previous sub-set of input bits. XOR operation is performed on each corresponding bit in the copy of the sub-set of weights with each corresponding bit in the selected sub-set of input bits. In a corresponding destination sub-location, output of each XOR operation is aggregated with each other and with current value of the corresponding destination sub-location.
-
公开(公告)号:US20180189229A1
公开(公告)日:2018-07-05
申请号:US15423272
申请日:2017-02-02
Inventor: Giuseppe DESOLI , Thomas BOESCH , Nitin CHAWLA , Surinder Pal SINGH , Elio GUIDETTI , Fabio Giuseppe DE AMBROGGI , Tommaso MAJO , Paolo Sergio ZAMBOTTI
Abstract: Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture. The SoC includes a system bus, a plurality of addressable memory arrays coupled to the system bus, at least one applications processor core coupled to the system bus, and a configurable accelerator framework coupled to the system bus. The configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system. The SoC also includes a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs coordinate functionality with the configurable accelerator framework to execute the DCNN.
-
-
-