INPUT AND OUTPUT SPATIAL CROPPING OPERATIONS IN NEURAL PROCESSOR CIRCUITS

    公开(公告)号:US20240330217A1

    公开(公告)日:2024-10-03

    申请号:US18616772

    申请日:2024-03-26

    申请人: Apple Inc.

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28 G06F2213/2806

    摘要: An SoC circuit includes a neural processor circuit coupled to a CPU. The neural processor circuit includes neural engines, a data processor DMA circuit, a system memory, and a data processor circuit. The CPU is configured to execute a compiler, which is in turn configured to determine to perform a mode of spatial cropping and the associated crop offset. The neural processor circuit is configured to support arbitrary cropping in the x and y dimensions. The compiler is configured to generate task descriptor(s), the task descriptor(s) distributed to components of the neural processor circuit. The data processor DMA circuit is configured to fetch and format data corresponding to the crop from a source to the buffer. The buffer is configured to realign the data according to the crop origin for broadcast to the neural engines. The neural engines is configured to perform a computation operation which uses the cropped data.

    TEXTURE UNIT CIRCUIT IN NEURAL NETWORK 
PROCESSOR

    公开(公告)号:US20240037399A1

    公开(公告)日:2024-02-01

    申请号:US18484203

    申请日:2023-10-10

    申请人: Apple Inc.

    IPC分类号: G06N3/08

    CPC分类号: G06N3/08

    摘要: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.

    REDUCTION MODE OF PLANAR ENGINE IN NEURAL PROCESSOR

    公开(公告)号:US20210158135A1

    公开(公告)日:2021-05-27

    申请号:US16695782

    申请日:2019-11-26

    申请人: Apple Inc.

    IPC分类号: G06N3/063 G06F9/30 G06F9/54

    摘要: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In a reduction mode, the planar engine circuit may process values arranged in one or more dimensions of input to generate a reduced value. The reduced values across multiple input data may be accumulated. The planar engine circuit may program a filter circuit as a reduction tree to gradually reduce the data into a reduced value. The reduction operation reduces the size of one or more dimensions of a tensor.

    SPLITTING OF INPUT DATA FOR PROCESSING IN NEURAL 
NETWORK PROCESSOR

    公开(公告)号:US20240028894A1

    公开(公告)日:2024-01-25

    申请号:US18360136

    申请日:2023-07-27

    申请人: Apple Inc.

    摘要: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.