ELIMINATING MEMORY BOTTLENECKS FOR DEPTHWISE CONVOLUTIONS

    公开(公告)号:US20230086802A1

    公开(公告)日:2023-03-23

    申请号:US17478609

    申请日:2021-09-17

    Abstract: Certain aspects of the present disclosure provide techniques for efficient depthwise convolution. A convolution is performed with a compute-in-memory (CIM) array to generate CIM output, and at least a portion of the CIM output corresponding to a first output data channel, of a plurality of output data channels in the CIM output, is written to a digital multiply-accumulate (DMAC) activation buffer. A patch of the CIM output is read from the DMAC activation buffer, and weight data is read from a DMAC weight buffer. Multiply-accumulate (MAC) operations are performed with the patch of CIM output and the weight data to generate a DMAC output.

    COMPUTE IN MEMORY-BASED MACHINE LEARNING ACCELERATOR ARCHITECTURE

    公开(公告)号:US20220414443A1

    公开(公告)日:2022-12-29

    申请号:US17359297

    申请日:2021-06-25

    Inventor: Ren LI

    Abstract: Certain aspects of the present disclosure provide techniques for processing machine learning model data with a machine learning task accelerator, including: configuring one or more signal processing units (SPUs) of the machine learning task accelerator to process a machine learning model; providing model input data to the one or more configured SPUs; processing the model input data with the machine learning model using the one or more configured SPUs; and receiving output data from the one or more configured SPUs.

    SYNCHRONIZED SENSORS AND SYSTEMS
    3.
    发明申请

    公开(公告)号:US20250088402A1

    公开(公告)日:2025-03-13

    申请号:US18466721

    申请日:2023-09-13

    Abstract: Synchronized sensors and systems are disclosed. Techniques involving a synchronized sensor system for determining a physiological parameter of a user may include: obtaining one or more first measurements at a first location of the user via a first sensor; obtaining one or more second measurements at a second location of the user via a second sensor; and determining the physiological parameter of the user based on the one or more first measurements, the one or more second measurements, and a distance between the first location and the second location, the distance between the first location and the second location determined based on acoustic communication between the first sensor and the second sensor. In some implementations, acoustic communication may include ultrasound signals between the first sensor and the second sensor, which may be time synchronized by exchanging timestamps.

    COMPUTATION IN MEMORY (CIM) ARCHITECTURE AND DATAFLOW SUPPORTING A DEPTH-WISE CONVOLUTIONAL NEURAL NETWORK (CNN)

    公开(公告)号:US20220414444A1

    公开(公告)日:2022-12-29

    申请号:US17361784

    申请日:2021-06-29

    Inventor: Ren LI

    Abstract: Certain aspects provide an apparatus for signal processing in a neural network. The apparatus generally includes a first set of computation in memory (CIM) cells configured as a first kernel for a neural network computation, the first set of CIM cells comprising on one or more first columns and a first plurality of rows of a CIM array, and a second set of CIM cells configured as a second kernel for the neural network computation, the second set of CIM cells comprising on one or more second columns and a second plurality of rows of the CIM array. In some aspects, the one or more first columns are different than the one or more second columns, and the first plurality of rows are different than the second plurality of rows.

    SPARSITY-AWARE COMPUTE-IN-MEMORY
    5.
    发明申请

    公开(公告)号:US20250124354A1

    公开(公告)日:2025-04-17

    申请号:US18989865

    申请日:2024-12-20

    Abstract: Certain aspects of the present disclosure provide techniques for performing machine learning computations in a compute in memory (CIM) array comprising a plurality of bit cells, including: determining that a sparsity of input data to a machine learning model exceeds an input data sparsity threshold; disabling one or more bit cells in the CIM array based on the sparsity of the input data prior to processing the input data; processing the input data with bit cells not disabled in the CIM array to generate an output value; applying a compensation to the output value based on the sparsity to generate a compensated output value; and outputting the compensated output value.

    ACTIVATION BUFFER ARCHITECTURE FOR DATA-REUSE IN A NEURAL NETWORK ACCELERATOR

    公开(公告)号:US20240256827A1

    公开(公告)日:2024-08-01

    申请号:US18565414

    申请日:2021-07-27

    CPC classification number: G06N3/04

    Abstract: Certain aspects provide an apparatus for signal processing in a neural network. The apparatus generally includes computation circuitry configured to perform a convolution operation, the computation circuitry having multiple input rows, and an activation buffer having multiple buffer segments coupled to the multiple the multiple input rows of the computation circuitry, respectively. In some aspects, each of the multiple buffer segments comprises a first multiplexer having a plurality of multiplexer inputs, and each of the plurality of multiplexer inputs of one of the first multiplexers on one of the multiple buffer segments is coupled to a data output of the activation buffer on another one of the multiple buffer segments.

    CONFIGURABLE NONLINEAR ACTIVATION FUNCTION CIRCUITS

    公开(公告)号:US20230185533A1

    公开(公告)日:2023-06-15

    申请号:US18165802

    申请日:2023-02-07

    CPC classification number: G06F7/556 G06F7/50

    Abstract: Certain aspects of the present disclosure provide a method for processing input data by a set of configurable nonlinear activation function circuits, including generating an exponent output by processing input data using one or more first configurable nonlinear activation function circuits configured to perform an exponential function, summing the exponent output of the one or more first configurable nonlinear activation function circuits, and generating an approximated log softmax output by processing the summed exponent output using a second configurable nonlinear activation function circuit configured to perform a natural logarithm function.

    COMPUTATION IN MEMORY ARCHITECTURE FOR PHASED DEPTH-WISE CONVOLUTIONAL

    公开(公告)号:US20220414454A1

    公开(公告)日:2022-12-29

    申请号:US17361807

    申请日:2021-06-29

    Inventor: Ren LI

    Abstract: Certain aspects provide an apparatus for signal processing in a neural network. The apparatus generally includes first computation in memory (CIM) cells configured as a first kernel for a neural network computation, the first set of CIM cells comprising one or more first columns and a first plurality of rows of a CIM array. The apparatus also include a second set of CIM cells configured as a second kernel for the neural network computation, the second set of CIM cells comprising the one or more first columns and a second plurality of rows of the CIM array. The first plurality of rows may be different than the second plurality of rows.

    CONFIGURABLE NONLINEAR ACTIVATION FUNCTION CIRCUITS

    公开(公告)号:US20230078203A1

    公开(公告)日:2023-03-16

    申请号:US17467079

    申请日:2021-09-03

    Abstract: Certain aspects of the present disclosure provide a method for processing input data by a configurable nonlinear activation function circuit, including determining a nonlinear activation function for application to input data; determining, based on the determined nonlinear activation function, a set of parameters for a configurable nonlinear activation function circuit; and processing input data with the configurable nonlinear activation function circuit based on the set of parameters to generate output data.

    SPARSITY-AWARE COMPUTE-IN-MEMORY
    10.
    发明申请

    公开(公告)号:US20230049323A1

    公开(公告)日:2023-02-16

    申请号:US17397653

    申请日:2021-08-09

    Inventor: Ren LI

    Abstract: Certain aspects of the present disclosure provide techniques for performing machine learning computations in a compute in memory (CIM) array comprising a plurality of bit cells, including: determining that a sparsity of input data to a machine learning model exceeds an input data sparsity threshold; disabling one or more bit cells in the CIM array based on the sparsity of the input data prior to processing the input data; processing the input data with bit cells not disabled in the CIM array to generate an output value; applying a compensation to the output value based on the sparsity to generate a compensated output value; and outputting the compensated output value.

Patent Agency Ranking