-
公开(公告)号:US20230394276A1
公开(公告)日:2023-12-07
申请号:US17833476
申请日:2022-06-06
Applicant: Apple Inc.
Inventor: Sayyed Karen Khatamifard , Chenfan Sun , Alon Yaakov , Husam Khashiboun , Jeffrey D. Marker , Saman Naderiparizi , Ramana V. Rachakonda , Rohit K. Gupta
CPC classification number: G06N3/04 , G06F9/4881 , G06F9/5016
Abstract: Embodiments relate to streaming convolution operations in a neural processor circuit that includes a neural engine circuit and a neural task manager. The neural task manager obtains multiple task descriptors and multiple subtask descriptors. Each task descriptor identifies a respective set of the convolution operations of a respective layer of a set of layers. Each subtask descriptor identifies a corresponding task descriptor and a subset of the convolution operations on a portion of a layer of the set of layers identified by the corresponding task descriptor. The neural processor circuit configures the neural engine circuit for execution of the subset of the convolution operations using the corresponding task descriptor. The neural engine circuit performs the subset of the convolution operations to generate output data that correspond to input data of another subset of the convolution operations identified by another subtask descriptor from the list of subtask descriptors.
-
公开(公告)号:US11907823B2
公开(公告)日:2024-02-20
申请号:US17860031
申请日:2022-07-07
Applicant: Apple Inc.
Inventor: Saman Naderiparizi , Mohammad Rastegari , Sayyed Karen Khatamifard
CPC classification number: G06N3/02 , G06F3/0604 , G06F3/0676 , G06F3/0677 , G06N3/045 , G06N3/063
Abstract: In one embodiment, a computing device includes an input sensor providing an input data; a programmable logic device (PLD) implementing a convolutional neural network (CNN), wherein: each compute block of the PLD corresponds to one of a multiple of convolutional layers of the CNN, each compute block of the PLD is placed in proximity to at least two memory blocks, a first one of the memory blocks serves as a buffer for the corresponding layer of the CNN, and a second one of the memory blocks stores model-specific parameters for the corresponding layer of the CNN.
-
公开(公告)号:US11651192B2
公开(公告)日:2023-05-16
申请号:US16788261
申请日:2020-02-11
Applicant: Apple Inc.
Inventor: James C. Gabriel , Mohammad Rastegari , Hessam Bagherinezhad , Saman Naderiparizi , Anish Prabhu , Sophie Lebrecht , Jonathan Gelsey , Sayyed Karen Khatamifard , Andrew L. Chronister , David Bakin , Andrew Z. Luo
Abstract: Systems and processes for training and compressing a convolutional neural network model include the use of quantization and layer fusion. Quantized training data is passed through a convolutional layer of a neural network model to generate convolutional results during a first iteration of training the neural network model. The convolutional results are passed through a batch normalization layer of the neural network model to update normalization parameters of the batch normalization layer. The convolutional layer is fused with the batch normalization layer to generate a first fused layer and the fused parameters of the fused layer are quantized. The quantized training data is passed through the fused layer using the quantized fused parameters to generate output data, which may be quantized for a subsequent layer in the training iteration.
-
公开(公告)号:US20230368008A1
公开(公告)日:2023-11-16
申请号:US17745032
申请日:2022-05-16
Applicant: Apple Inc.
Inventor: Sayyed Karen Khatamifard , Alexander J. Kirchhoff , Rohit K. Gupta , Jeffrey D. Marker , Thomas G. Anderl , Saman Naderiparizi , Chenfan Sun , Alon Yaakov , Husam Khashiboun , Ramana V. Rachakonda
IPC: G06N3/063
CPC classification number: G06N3/063
Abstract: Embodiments relate to streaming operations in a neural processor circuit that includes a neural engine circuit and a data processor circuit. The neural engine circuit performs first operations on a first input tensor of a first layer to generate a first output tensor, and second operations on a second input tensor of a second layer at a higher hierarchy than the first layer, the second input tensor corresponding to the first output tensor. The data processor circuit stores a portion of the first input tensor for access by the neural engine circuit to perform a subset of the first operations and generate a portion of the first output tensor. The data processor circuit stores the portion of the first output tensor for access by the neural engine circuit as a portion of the second input tensor to perform a subset of the second operations.
-
-
-