-
公开(公告)号:US11630991B2
公开(公告)日:2023-04-18
申请号:US16781824
申请日:2020-02-04
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Kenneth W. Waters , Youchang Kim
Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.
-
公开(公告)号:US11604975B2
公开(公告)日:2023-03-14
申请号:US16844964
申请日:2020-04-09
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Kenneth W. Waters , Youchang Kim
Abstract: A neural processor includes one or more neural engine circuits and a planar engine circuit. The neural engine circuits can perform convolution operations of first input data with one or more kernels to generate a first output. The planar engine circuit receives second input data that corresponds to a version of the first input data. The planar engine circuit also receives third input data that includes fourth input data and fifth input data stored together in a dimension of third input data. The planar engine circuit performs a first elementwise operation between a version of the second input data and a version of the fourth input data to generate intermediate data. The planar engine circuit performs a second elementwise operation between the intermediate data and a version of the fifth input data to generate a second output.
-
公开(公告)号:US11599780B2
公开(公告)日:2023-03-07
申请号:US16806798
申请日:2020-03-02
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Kenneth W. Waters
Abstract: A neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.
-
公开(公告)号:US11580353B2
公开(公告)日:2023-02-14
申请号:US15971868
申请日:2018-05-04
Applicant: Apple Inc.
Inventor: Christopher L. Mills
Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
-
公开(公告)号:US20230018248A1
公开(公告)日:2023-01-19
申请号:US17944889
申请日:2022-09-14
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Sung Hee Park
Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
-
公开(公告)号:US11537838B2
公开(公告)日:2022-12-27
申请号:US15971882
申请日:2018-05-04
Applicant: Apple Inc.
Inventor: Erik K. Norden , Liran Fishel , Sung Hee Park , Jaewon Shin , Christopher L. Mills , Seungjin Lee , Fernando A. Mujica
IPC: G06N3/04 , G06F1/3296 , G06N3/08
Abstract: Embodiments relate to a neural processor circuit with scalable architecture for instantiating one or more neural networks. The neural processor circuit includes a data buffer coupled to a memory external to the neural processor circuit, and a plurality of neural engine circuits. To execute tasks that instantiate the neural networks, each neural engine circuit generates output data using input data and kernel coefficients. A neural processor circuit may include multiple neural engine circuits that are selectively activated or deactivated according to configuration data of the tasks. Furthermore, an electronic device may include multiple neural processor circuits that are selectively activated or deactivated to execute the tasks.
-
公开(公告)号:US20210319290A1
公开(公告)日:2021-10-14
申请号:US16844964
申请日:2020-04-09
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Kenneth W. Waters , Youchang Kim
Abstract: A neural processor includes one or more neural engine circuits and a planar engine circuit. The neural engine circuits can perform convolution operations of first input data with one or more kernels to generate a first output. The planar engine circuit receives second input data that corresponds to a version of the first input data. The planar engine circuit also receives third input data that includes fourth input data and fifth input data stored together in a dimension of third input data. The planar engine circuit performs a first elementwise operation between a version of the second input data and a version of the fourth input data to generate intermediate data. The planar engine circuit performs a second elementwise operation between the intermediate data and a version of the fifth input data to generate a second output.
-
公开(公告)号:US20190340489A1
公开(公告)日:2019-11-07
申请号:US15971868
申请日:2018-05-04
Applicant: Apple Inc.
Inventor: Christopher L. Mills
Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
-
公开(公告)号:US20190340486A1
公开(公告)日:2019-11-07
申请号:US15971444
申请日:2018-05-04
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Erik K. Norden , Sung Hee Park
Abstract: Embodiments relate to a neural processor circuit including a plurality of neural engine circuits, a data buffer, and a kernel fetcher circuit. At least one of the neural engine circuits is configured to receive matrix elements of a matrix as at least the portion of the input data from the data buffer over multiple processing cycles. The at least one neural engine circuit further receives vector elements of a vector from the kernel fetcher circuit, wherein each of the vector elements is extracted as a corresponding kernel to the at least one neural engine circuit in each of the processing cycles. The at least one neural engine circuit performs multiplication between the matrix and the vector as a convolution operation to produce at least one output channel of the output data.
-
公开(公告)号:US20160110843A1
公开(公告)日:2016-04-21
申请号:US14977384
申请日:2015-12-21
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Sheng Lin , David R. Pope , D. Amnon Silverstein , Suk Hwan Lim
CPC classification number: H04N9/04511 , G06T3/4015 , H04N5/2628 , H04N9/07 , H04N9/646 , H04N2209/046
Abstract: An input rescale module that performs cross-color correlated downscaling of sensor data in the horizontal and vertical dimensions. The module may perform a first-pass demosaic of sensor data, apply horizontal and vertical scalers to resample and downsize the data in the horizontal and vertical dimensions, and then remosaic the data to provide horizontally and vertically downscaled sensor data as output for additional image processing. The module may, for example, act as a front end scaler for an image signal processor (ISP). The demosaic performed by the module may be a relatively simple demosaic, for example a demosaic function that works on 3×3 blocks of pixels. The front end of module may receive and process sensor data at two pixels per clock (ppc); the horizontal filter component reduces the sensor data down to one ppc for downstream components of the input rescale module and for the ISP pipeline.
Abstract translation: 在水平和垂直维度上执行传感器数据的交叉色相关缩小的输入重定标模块。 模块可以执行传感器数据的第一遍去马赛克,应用水平和垂直缩放器对水平和垂直尺寸的数据进行重新取样和缩小,然后重新绘制数据,以提供水平和垂直缩小的传感器数据作为附加图像处理的输出 。 该模块可以例如用作图像信号处理器(ISP)的前端缩放器。 由模块执行的去镶嵌可以是相对简单的去马赛克,例如在3×3像素块上工作的去马赛克功能。 模块的前端可以以每个时钟两个像素(ppc)接收和处理传感器数据; 水平滤波器组件将传感器数据减少到输入重定标模块的下游组件和ISP管线的一ppc。
-
-
-
-
-
-
-
-
-