REDUCED LATENCY TENSOR TRANSPOSITION WITHOUT REDUNDANT BUFFER

    公开(公告)号:US20240362296A1

    公开(公告)日:2024-10-31

    申请号:US18308175

    申请日:2023-04-27

    IPC分类号: G06F9/30 G06F16/22 G11C11/54

    摘要: Techniques for reduced latency tensor transposition without using a redundant buffer are enabled. Reads from and writes to a buffer array may occur in different dimensions in a neural processing unit (NPU). For example, a set of tensor vectors may be written to a buffer in columnar format and read from the buffer in row format. As vectors of a first tensor are read from the buffer, incoming vectors from a second tensor may be transposed for storage in the dimension of already-read vectors without overwriting unread vectors. Write and read operations may alternately transpose vectors for continuous buffering in a single buffer with reduced latency.

    REDUCED POWER CONSUMPTION ANALOG OR HYBRID MAC NEURAL NETWORK

    公开(公告)号:US20230244921A1

    公开(公告)日:2023-08-03

    申请号:US17588657

    申请日:2022-01-31

    IPC分类号: G06N3/063 G06J1/00

    CPC分类号: G06N3/0635 G06J1/00

    摘要: Power efficient performance may be implemented in a hardware accelerator (e.g., a neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). For example, power consumption may be reduced in neural networks with a rectified linear unit (ReLU) activation layer. A hybrid or analog MAC circuit may be configured with a look-ahead sign detector to dynamically stop computations prior to completion, for example, based on detection of a negative value, which a ReLU activation layer may (e.g., subsequently) convert to zero. The sign of a value may be indicated by a most significant bit (MSB). A controller may provide power and/or clock cycles to an analog to digital converter (ADC) to determine a sign of a value being computed. The sign may be used to selectively complete computations for positive values and selectively terminate computations for negative values, thereby reducing power consumption of the MAC circuit.