Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.
Abstract:
A method is described that includes instantiating, within an application software development environment, a virtual processor having an instruction set architecture and memory model that contemplate first and second regions of reserved memory. The first reserved region is to keep data of an input image array. The second reserved region is to keep data of an output image array. The method also includes simulating execution of a memory load instruction of the instruction set architecture by automatically targeting the first reserved region and identifying desired input data with first and second coordinates relative to the virtual processor's position within an orthogonal coordinate system and expressed in the instruction format of the memory load instruction.
Abstract:
An apparatus is described. The apparatus includes an image processing unit. The image processing unit includes a network. The image processing unit includes a plurality of stencil processor circuits each comprising an array of execution unit lanes coupled to a two-dimensional shift register array structure to simultaneously process multiple overlapping stencils through execution of program code. The image processing unit includes a plurality of sheet generators respectively coupled between the plurality of stencil processors and the network. The sheet generators are to parse input line groups of image data into input sheets of image data for processing by the stencil processors, and, to form output line groups of image data from output sheets of image data received from the stencil processors. The image processing unit includes a plurality of line buffer units coupled to the network to pass line groups in a direction from producing stencil processors to consuming stencil processors to implement an overall program flow.
Abstract:
A method is described that includes instantiating, within an application software development environment, a virtual processor having an instruction set architecture and memory model that contemplate first and second regions of reserved memory. The first reserved region is to keep data of an input image array. The second reserved region is to keep data of an output image array. The method also includes simulating execution of a memory load instruction of the instruction set architecture by automatically targeting the first reserved region and identifying desired input data with first and second coordinates relative to the virtual processor's position within an orthogonal coordinate system and expressed in the instruction format of the memory load instruction.
Abstract:
A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, doubling a simultaneous shift amount of multiple rows or columns of the two dimensional shift register array with each next iteration. The method also includes executing one or more instructions within respective lanes of the two dimensional execution lane array in between shifts of iterations. Another method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly executing one or more instructions within respective lanes of the execution lane array that select between content in different registers of a same array location in between repeated simultaneous shifts of multiple rows or columns of data in the two dimensional shift register array.
Abstract:
A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resident in respective locations of the two dimensional shift register array that the shifted first content has been shifted into.
Abstract:
A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.
Abstract:
An apparatus is described. The apparatus includes an image processing unit. The image processing unit includes a plurality of stencil processor circuits each comprising an array of execution unit lanes coupled to a two-dimensional shift register array structure to simultaneously process multiple overlapping stencils through execution of program code. The image processing unit includes a plurality of sheet generators respectively coupled between the plurality of stencil processors and the network. The sheet generators are to parse input line groups of image data into input sheets of image data for processing by the stencil processors, and, to form output line groups of image data from output sheets of image data received from the stencil processors. The image processing unit includes a plurality of line buffer units coupled to the network to pass line groups in a direction from producing stencil processors to consuming stencil processors to implement an overall program flow.
Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.
Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.