Abstract:
An apparatus is described that includes an execution unit having a multiply add computation unit, a first ALU logic unit and a second ALU logic unit. The ALU unit is to perform first, second, third and fourth instructions. The first instruction is a multiply add instruction. The second instruction is to perform parallel ALU operations with the first and second ALU logic units operating simultaneously to produce different respective output resultants of the second instruction. The third instruction is to perform sequential ALU operations with one of the ALU logic units operating from an output of the other of the ALU logic units to determine an output resultant of the third instruction. The fourth instruction is to perform an iterative divide operation in which the first ALU logic unit and the second ALU logic unit operate during to determine first and second division resultant digit values.
Abstract:
An apparatus is described. The apparatus includes a program controller to fetch and issue instructions. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lanes of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array.
Abstract:
In a general aspect, an apparatus can include image processing logic (IPL) configured to perform an image processing operation on pixel data corresponding with an image having a width of W pixels and a height of H pixels to produce output pixel data in vertical slices of K pixels using K vertically overlapping stencils of S×S pixels, K being greater than 1 and less than H, S being greater than or equal to 2, and W being greater than S. The apparatus can also include a linebuffer operationally coupled with the IPL, the linebuffer configured to buffer the pixel data for the IPL. The linebuffer can include a full-size buffer having a width of W and a height of (S−1). The linebuffer can also include a sliding buffer having a width of SB and a height of K, SB being greater than or equal to S and less than W.
Abstract:
An apparatus is described that includes an execution unit having a multiply add computation unit, a first ALU logic unit and a second ALU logic unit. The ALU unit is to perform first, second, third and fourth instructions. The first instruction is a multiply add instruction. The second instruction is to perform parallel ALU operations with the first and second ALU logic units operating simultaneously to produce different respective output resultants of the second instruction. The third instruction is to perform sequential ALU operations with one of the ALU logic units operating from an output of the other of the ALU logic units to determine an output resultant of the third instruction. The fourth instruction is to perform an iterative divide operation in which the first ALU logic unit and the second ALU logic unit operate during to determine first and second division resultant digit values.
Abstract:
A method is described. The method includes repeatedly loading a next sheet of image data from a first location of a memory into a two dimensional shift register array. The memory is locally coupled to the two-dimensional shift register array and an execution lane array having a smaller dimension than the two-dimensional shift register array along at least one array axis. The loaded next sheet of image data keeps within an image area of the two-dimensional shift register array. The method also includes repeatedly determining output values for the next sheet of image data through execution of program code instructions along respective lanes of the execution lane array, wherein, a stencil size used in determining the output values encompasses only pixels that reside within the two-dimensional shift register array.
Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.
Abstract:
An apparatus is described that include a line buffer unit composed of a plurality of a line buffer interface units. Each line buffer interface unit is to handle one or more requests by a respective producer to store a respective line group in a memory and handle one or more requests by a respective consumer to fetch and provide the respective line group from memory. The line buffer unit has programmable storage space whose information establishes line group size so that different line group sizes for different image sizes are storable in memory.
Abstract:
A shift register is described. The shift register includes a plurality of cells and register space. The shift register includes circuitry having inputs to receive shifted data and outputs to transmit shifted data, wherein: i) circuitry of cells physically located between first and second logically ordered cells are configured to not perform any logical shift; ii) circuitry of cells coupled to receive shifted data transmitted by an immediately preceding logically ordered cell comprises circuitry for writing into local register space data received at an input assigned an amount of shift specified in a shift command being executed by the shift register, and, iii) circuitry of cells coupled to transmit shifted data to an immediately following logically ordered cell comprises circuitry to transmit data from an output assigned an incremented shift amount from a shift amount of an input that the data was received on.
Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.
Abstract:
An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.