Abstract:
An image processing pipeline may process image data at multiple rates. A stream of raw pixel data collected from an image sensor for an image frame may be processed through one or more pipeline stages of an image signal processor. The stream of raw pixel data may then be converted into a full-color domain and scaled to a data size that is less than an initial data size for the image frame. The converted pixel data may be processed through one or more other pipelines stages and output for storage, further processing, or display. In some embodiments, a back-end interface may be implemented as part of the image signal processor via which image data collected from sources other than the image sensor may be received and processed through various pipeline stages at the image signal processor.
Abstract:
An image signal processor of a device, apparatus, or computing system that includes a camera capable of capturing image data may apply piecewise perspective transformations to image data received from the camera's image sensor. A scaling unit of an Image Signal Processor (ISP) may perform piecewise perspective transformations on a captured image to correct for rolling shutter artifacts and to provide video image stabilization. Image data may be divided into a series of horizontal slices and perspective transformations may be applied to each slice. The transformations may be based on motion data determined in any of various manners, such as by using gyroscopic data and/or optical-flow calculations. The piecewise perspective transforms may be encoded as Digital Difference Analyzer (DDA) steppers and may be implemented using separable scalar operations. The image signal processor may not write the received image data to system memory until after the transformations have been performed.
Abstract:
An output rescale module may determine an estimated set of lines to hold in vertical support for use when performing image transformations. For example, an output rescale module may monitor input Y coordinates (in terms of input pixel lines) computed over previous lines and compute a set of lines to hold in a set of line buffers. As each output pixel line is generated, the output rescale module may compute the minimum and maximum values of Y generated by the transform across that line. The minimum and maximum input Y coordinates may then be averaged to determine the center value (the centermost input line) for that output line. The difference (in terms of input pixel lines) between centerlines for two adjacent output lines may be added to the centerline value for the current output line to estimate a center line for the next (net yet generated) output pixel line.
Abstract:
An image signal processor of a device, apparatus, or computing system that includes a camera capable of capturing image data may apply piecewise perspective transformations to image data received from the camera's image sensor. A scaling unit of an Image Signal Processor (ISP) may perform piecewise perspective transformations on a captured image to correct for rolling shutter artifacts and to provide video image stabilization. Image data may be divided into a series of horizontal slices and perspective transformations may be applied to each slice. The transformations may be based on motion data determined in any of various manners, such as by using gyroscopic data and/or optical-flow calculations. The piecewise perspective transforms may be encoded as Digital Difference Analyzer (DDA) steppers and may be implemented using separable scalar operations. The image signal processor may not write the received image data to system memory until after the transformations have been performed.
Abstract:
Embodiments of the present disclosure relate to a tensor access operation circuit in a neural processor circuit. The neural processor circuit further includes a data processor circuit and at least one neural engine circuit. The tensor access operation circuit indirectly accesses at least a region of a source tensor in a system memory having a rank, and maps one or more source components of the source tensor into an input tensor having another rank. The data processor circuit stores an output version of the input tensor obtained from the tensor access operation circuit and sends the output version of the input tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
Abstract:
A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor circuit also includes a data processor circuit that is coupled to one or more neural engine. The data processor circuit receives the output data from the neural engine and generates a branching command from the output data. The neural processor circuit further includes a task manager that is coupled to the data processor circuit. The task manager receives the branching command from the data processor circuit. The task manager enqueues one of two or more segment branches according to the received branching command. The two or more segment branches are subsequent to a pre-branch task segment that includes the pre-branch task. The task manager transmits a task from the selected one of the segment branches to data processor circuit to perform the task.
Abstract:
Embodiments relate to a neural processor circuit with scalable architecture for instantiating one or more neural networks. The neural processor circuit includes a data buffer coupled to a memory external to the neural processor circuit, and a plurality of neural engine circuits. To execute tasks that instantiate the neural networks, each neural engine circuit generates output data using input data and kernel coefficients. A neural processor circuit may include multiple neural engine circuits that are selectively activated or deactivated according to configuration data of the tasks. Furthermore, an electronic device may include multiple neural processor circuits that are selectively activated or deactivated to execute the tasks.
Abstract:
Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
Abstract:
Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.
Abstract:
Embodiments of the present disclosure relate to binary comparison operations (e.g., Boolean operations) and reduction operations in a neural processor circuit to enable implementation of conditional operations without software control. The neural processor circuit includes a neural engine circuit and a planar engine circuit coupled to the neural engine circuit. The neural engine circuit performs a convolution operation to generate output data. The planar engine circuit includes a binary comparator circuit and a filter circuit coupled to the binary comparator circuit. The binary comparator circuit performs a binary comparison operation on a tensor from the output data to generate a conditional tensor. The filter circuit performs a reduction operation for each patch of the conditional tensor to generate a respective reduced value of multiple reduced values associated with a corresponding channel of multiple channels of the conditional tensor.