Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.
Abstract:
A method is described that includes translating higher level program code including higher level instructions having an instruction format that identifies pixels to be accessed from a memory with first and second coordinates from an orthogonal coordinate system into lower level instructions that target a hardware architecture having an array of execution lanes and a shift register array structure that is able to shift data along two different axis. The translating includes replacing the higher level instructions having the instruction format with lower level shift instructions that shift data within the shift register array structure.
Abstract:
An image processor unit is described. The image processor unit includes a plurality of inputs to receive at least one input image. The image processor unit includes a plurality of outputs to provide at least one output image. The image processor unit includes a network coupled to the plurality of inputs and the plurality of outputs. The network is to couple at least one of the inputs to at least one of the outputs. The image processor unit includes an image processor circuit coupled to the network. The network to route an input image that is received at one of the inputs to the image processor circuit. The image processor circuit is to execute image signal processing program code to generate a processed output image from the input image. The network is to route the processed output image to at least one of the outputs.
Abstract:
A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resident in respective locations of the two dimensional shift register array that the shifted first content has been shifted into.
Abstract:
A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.
Abstract:
A method is described that includes loading an array of content into a two-dimensional shift register. The two-dimensional shift register is coupled to an execution lane array. The method includes repeatedly performing a first sequence including: shifting with the shift register first content residing along a particular row or column into another parallel row or column where second content resides and performing operations with a particular corresponding row or column of the execution lane array on the first and second content. The method also includes repeatedly performing a second sequence including: shifting with the shift register content from a set of first locations along a resultant row or column that is parallel with the rows or columns of the first sequence into a corresponding set of second locations along the resultant row or column. The resultant row or column has values determined from the operations of the first sequence.
Abstract:
A method is described that includes translating higher level program code including higher level instructions having an instruction format that identifies pixels to be accessed from a memory with first and second coordinates from an orthogonal coordinate system into lower level instructions that target a hardware architecture having an array of execution lanes and a shift register array structure that is able to shift data along two different axis. The translating includes replacing the higher level instructions having the instruction format with lower level shift instructions that shift data within the shift register array structure.
Abstract:
In a general aspect, an apparatus can include image processing logic (IPL) configured to perform an image processing operation on pixel data corresponding with an image having a width of W pixels and a height of H pixels to produce output pixel data in vertical slices of K pixels using K vertically overlapping stencils of S×S pixels, K being greater than 1 and less than H, S being greater than or equal to 2, and W being greater than S. The apparatus can also include a linebuffer operationally coupled with the IPL, the linebuffer configured to buffer the pixel data for the IPL. The linebuffer can include a full-size buffer having a width of W and a height of (S−1). The linebuffer can also include a sliding buffer having a width of SB and a height of K, SB being greater than or equal to S and less than W.
Abstract:
An apparatus is described that includes an execution unit having a multiply add computation unit, a first ALU logic unit and a second ALU logic unit. The ALU unit is to perform first, second, third and fourth instructions. The first instruction is a multiply add instruction. The second instruction is to perform parallel ALU operations with the first and second ALU logic units operating simultaneously to produce different respective output resultants of the second instruction. The third instruction is to perform sequential ALU operations with one of the ALU logic units operating from an output of the other of the ALU logic units to determine an output resultant of the third instruction. The fourth instruction is to perform an iterative divide operation in which the first ALU logic unit and the second ALU logic unit operate during to determine first and second division resultant digit values.
Abstract:
An apparatus is described. The apparatus includes a program controller to fetch and issue instructions. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lanes of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array.