EFFICIENT ZERO PADDING IN CONVOLUTION AT NEURAL PROCESSOR

    公开(公告)号:US20240220764A1

    公开(公告)日:2024-07-04

    申请号:US18093051

    申请日:2023-01-04

    Applicant: Apple Inc.

    CPC classification number: G06N3/02

    Abstract: Embodiments relate to a method of efficient zero-padding in convolution. The method includes accessing a partition among a plurality of partitions of an input tensor. The input tensor is divided into the plurality of partitions in a raster-scan direction. For each row of a kernel for performing convolution on the input tensor, a register is populated with a set of values indicating a zero-padding pattern. For a compute cycle in the row of the kernel, computations associated with the convolution are performed based in part on the zero-padding pattern. After that, an updated zero-padding pattern representing a zero-padding pattern for a next cycle in the row of the kernel is generated. The set of values in the register is updated to the updated zero-padding pattern.

    SUBTASK STORAGE FOR STREAMING CONVOLUTIONS IN NEURAL NETWORK PROCESSOR

    公开(公告)号:US20230394276A1

    公开(公告)日:2023-12-07

    申请号:US17833476

    申请日:2022-06-06

    Applicant: Apple Inc.

    CPC classification number: G06N3/04 G06F9/4881 G06F9/5016

    Abstract: Embodiments relate to streaming convolution operations in a neural processor circuit that includes a neural engine circuit and a neural task manager. The neural task manager obtains multiple task descriptors and multiple subtask descriptors. Each task descriptor identifies a respective set of the convolution operations of a respective layer of a set of layers. Each subtask descriptor identifies a corresponding task descriptor and a subset of the convolution operations on a portion of a layer of the set of layers identified by the corresponding task descriptor. The neural processor circuit configures the neural engine circuit for execution of the subset of the convolution operations using the corresponding task descriptor. The neural engine circuit performs the subset of the convolution operations to generate output data that correspond to input data of another subset of the convolution operations identified by another subtask descriptor from the list of subtask descriptors.

Patent Agency Ranking