-
公开(公告)号:US20250077865A1
公开(公告)日:2025-03-06
申请号:US18821669
申请日:2024-08-30
Applicant: Apple Inc.
Inventor: Christopher L. MILLS
IPC: G06N3/08
Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
-
公开(公告)号:US20240037399A1
公开(公告)日:2024-02-01
申请号:US18484203
申请日:2023-10-10
Applicant: Apple Inc.
Inventor: Christopher L. MILLS
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
-
公开(公告)号:US20250165784A1
公开(公告)日:2025-05-22
申请号:US19030424
申请日:2025-01-17
Applicant: APPLE INC.
Inventor: Christopher L. MILLS
Abstract: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.
-
公开(公告)号:US20240330217A1
公开(公告)日:2024-10-03
申请号:US18616772
申请日:2024-03-26
Applicant: Apple Inc.
Inventor: Christopher L. MILLS
IPC: G06F13/28
CPC classification number: G06F13/28 , G06F2213/2806
Abstract: An SoC circuit includes a neural processor circuit coupled to a CPU. The neural processor circuit includes neural engines, a data processor DMA circuit, a system memory, and a data processor circuit. The CPU is configured to execute a compiler, which is in turn configured to determine to perform a mode of spatial cropping and the associated crop offset. The neural processor circuit is configured to support arbitrary cropping in the x and y dimensions. The compiler is configured to generate task descriptor(s), the task descriptor(s) distributed to components of the neural processor circuit. The data processor DMA circuit is configured to fetch and format data corresponding to the crop from a source to the buffer. The buffer is configured to realign the data according to the crop origin for broadcast to the neural engines. The neural engines is configured to perform a computation operation which uses the cropped data.
-
公开(公告)号:US20250165747A1
公开(公告)日:2025-05-22
申请号:US19030867
申请日:2025-01-17
Applicant: APPLE INC.
Inventor: Erik Norden , Liran FISHEL , Sung Hee PARK , Jaewon SHIN , Christopher L. MILLS , Seungjin LEE , Fernando A. MUJICA
IPC: G06N3/04 , G06F1/3296 , G06N3/08
Abstract: Embodiments relate to a neural processor circuit with scalable architecture for instantiating one or more neural networks. The neural processor circuit includes a data buffer coupled to a memory external to the neural processor circuit, and a plurality of neural engine circuits. To execute tasks that instantiate the neural networks, each neural engine circuit generates output data using input data and kernel coefficients. A neural processor circuit may include multiple neural engine circuits that are selectively activated or deactivated according to configuration data of the tasks. Furthermore, an electronic device may include multiple neural processor circuits that are selectively activated or deactivated to execute the tasks.
-
公开(公告)号:US20240028894A1
公开(公告)日:2024-01-25
申请号:US18360136
申请日:2023-07-27
Applicant: Apple Inc.
Inventor: Christopher L. MILLS
Abstract: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.
-
公开(公告)号:US20250139424A1
公开(公告)日:2025-05-01
申请号:US18902699
申请日:2024-09-30
Applicant: Apple Inc.
Inventor: Christopher L. MILLS , Sung Hee Park
Abstract: Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a convolution operation on input data in a first mode and a parallel sorting operation on input data in a second mode. The neural engine circuit includes a plurality of operation circuits and an accumulator circuit coupled to the plurality of operation circuits. The plurality of operation circuits receives input data. In the first mode, the plurality of operation circuits performs multiply-add operations of a convolution on the input data using a kernel. In the second mode, the plurality of operation circuits performs a portion of a parallel sorting operation on the input data. In the first mode, the accumulator circuit receives and stores first results of the multiply-add operations. In the second mode, the accumulator circuit receives and stores second results of the parallel sorting operation.
-
公开(公告)号:US20250124272A1
公开(公告)日:2025-04-17
申请号:US19002281
申请日:2024-12-26
Applicant: Apple Inc.
Inventor: Christopher L. MILLS , Kenneth W. Waters , Youchang Kim
Abstract: Embodiments relate to a neural processor that include a plurality of neural engine circuits and one or more planar engine circuits. The plurality of neural engine circuits can perform convolution operations of input data of the neural engine circuits with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. The planar engine circuit generates an output from input data that corresponds to output of the neural engine circuits or a version of input data of the neural processor. The planar engine circuit can be configured to multiple modes. In a pooling mode, the planar engine circuit reduces a spatial size of a version of the input data. In an elementwise mode, the planar engine circuit performs an elementwise operation on the input data. In a reduction mode, the planar engine circuit reduces the rank of a tensor.
-
公开(公告)号:US20250165282A1
公开(公告)日:2025-05-22
申请号:US19001159
申请日:2024-12-24
Applicant: Apple Inc.
Inventor: Christopher L. MILLS , Kenneth W. Waters
Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor also includes a data processor circuit coupled to external system memory. The data processor circuit includes a buffer for storing the output data from the neural engines. The neural processor further includes a task manager coupled to the data processor circuit. The task manager receives a context-switch task. The context-switch task specifies a switch of the data processor circuit from handling an outgoing task to an incoming task. The task manager sends configuration data of the context-switch task to cause the data processor circuit to transmit the output data corresponding to the outgoing task from the buffer to the external system memory. The data processor circuit also fetches data corresponding to the incoming task from the external system memory to the buffer.
-
公开(公告)号:US20250103870A1
公开(公告)日:2025-03-27
申请号:US18973857
申请日:2024-12-09
Applicant: Apple Inc.
Inventor: Kenneth W. WATERS , Christopher L. MILLS
Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor circuit also includes a data processor circuit that is coupled to one or more neural engine. The data processor circuit receives the output data from the neural engine and generates a branching command from the output data. The neural processor circuit further includes a task manager that is coupled to the data processor circuit. The task manager receives the branching command from the data processor circuit. The task manager enqueues one of two or more segment branches according to the received branching command. The two or more segment branches are subsequent to a pre-branch task segment that includes the pre-branch task. The task manager transmits a task from the selected one of the segment branches to data processor circuit to perform the task.
-
-
-
-
-
-
-
-
-