摘要:
A parallel processing technique is described for performing parallel processing operations upon N-dimensional arrays of data elements for which a corresponding N-dimensional Scoreboard of status data is held. Hazard checking for data dependencies upon data elements within the N-dimensional array of data elements is performed by looking up the corresponding status value within the Scoreboard. The status data for a given data element within the Scoreboard is located at a position which can be derived from the position of the data elements within its N-dimensional array. Thus, a two-dimensional array of video macroblocks can have a corresponding two-dimensional Scoreboard of status data indicating whether individual macroblocks have, for example, either already been deblocked or have not already been deblocked.
摘要:
A computer implemented tool is provided for assisting in the mapping of a computer program to an asymmetric multiprocessing apparatus 2 incorporating an asymmetric memory hierarchy formed of a plurality of memories 12, 14. An at least partial architectural description 22, 40 is provided as an input variable to the tool and used to infer missing annotations within a source computer program 24, such as which functions are to be executed by which execution mechanisms 4, 6, 8 and which variables are to be stored within which memories 12, 14. The tool also adds mapping support commands, such as cache flush commands, cache invalidate commands, DMA move commands and the like as necessary to support the mapping of the computer program to the asymmetric multiprocessing apparatus 2.
摘要:
The present invention provides a data processing apparatus and method for performing data processing operations on floating point data elements. The data processing apparatus has processing logic for performing data processing operations on the floating point data elements, and decode logic operable to decode a data processing instruction in order to determine a corresponding data processing operation to be performed by the processing logic. The data processing instruction has an m-bit immediate value encoded therein. Further, constant generation logic is provided to perform a logical operation on the m-bit immediate value in order to generate an n-bit floating point constant for use as at least one input floating point data element for the processing logic when performing the corresponding data processing operation. The values “n” and “m” are integers, and n is greater than m. This approach provides a particularly efficient technique for generating floating point constants.
摘要:
A data processing apparatus and method. The data processing apparatus comprising: a register data store operable to store data elements; an instruction decoder operable to decode a shift instruction; a data processor operable to perform data processing operations controlled by said instruction decoder wherein: in response to said decoded shift instruction, said data processor is operable to specify within said register data store, one or more source registers operable to store a plurality of source data elements of a first size, and one or more destination registers operable to store a corresponding plurality of resultant data elements of a second size, said second size not being equal to said first size; and to perform the following operations in parallel on said plurality of source data elements to produce said corresponding plurality of resultant data elements: shift each of said plurality of source data elements a specified number of places; form at least a part of each of said resultant data elements from information derived from at least a portion of a corresponding one of said plurality of source data elements; store said resultant data elements in said destination register.
摘要:
A data processing apparatus operate to process floating point operands is disclosed. The data processing apparatus comprises: an instruction decoder operable to decode an instruction for processing floating point operands; and a data processor operable to perform data processing operations controlled by the instruction decoder wherein: in response to the decoded instruction indicating operation according to a flush-to-zero semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as zero operands; and in response to the decoded instruction indicating operation according to a denormal semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as denormal operands.
摘要:
Leakage current from a circuit for handling data is reduced using leakage control circuit operable in a leakage reduction mode. The data handling circuit comprises data handling logic operable to receive an input data value and to output and output data value. The data handling circuit also comprises a latch operable to latch the output data value in response to a clock signal having a clock period. Both the leakage control circuitry and the latch are controlled dependent upon the same clock signal and the leakage control circuitry is controlled such that it is in a leakage reduction mode for a time less than the clock period. This approach enables leakage reduction to be provided in circuits which are still operational and is particularly suited to data handling circuits that employ frequency scaling.
摘要:
A table lookup extension instruction is provided in which index values stored within an index register D2 are used to select data elements stored within one or more table registers D0, D1 for storage into corresponding positions within a result register D3. Out-of-range index values result in the corresponding locations within the result register being left unchanged U. In this way, an offset can be applied to index values held and then those index values reused with the table registers D0, D1 being reloaded with a different portion of a table so as to give the effect of a larger table than can be directly supported by the number of table registers available.
摘要:
A data processing apparatus, method and computer program product. The apparatus comprising: a register data store comprising at least three general purpose registers each operable to store a plurality of data elements; an instruction decoder operable to decode a multiplex instruction; a data processor operable to process said plurality of data elements in parallel, said data processor being controlled by said instruction decoder; and in response to said decoded multiplex instruction, said data processor being operable to specify: two of said at least three general-purpose registers as source registers, each operable to store a plurality of source data elements; a further one of said at least three registers as a control register operable to store a plurality of control values; and one of said control, or said two source registers as a destination register operable to store a plurality of resultant data elements; wherein in response to each of said plurality of control values said data processor is operable to select a corresponding data element from one of said two source registers, and to store said corresponding data element as a resultant data element in said destination register.
摘要:
A data processing apparatus and method are provided for moving data between registers and memory. The data processing apparatus comprises a register data store having a plurality of registers operable to store data elements. A processor is operable to perform in parallel a data processing operation on multiple data elements occupying different lanes of parallel processing in at least one of the registers. Access logic is provided which is responsive to a single access instruction to move a plurality of data elements between a chosen one of the lanes in specified registers and a structure within memory having a structure format, the structure format having a plurality of components. The single access instruction identifies the number of components in the structure format, and the access logic is operation to arrange the plurality of data elements as they are moved such that data elements of different components are stored in different specified registers within the chosen lane whilst in memory the data elements are stored as the structure.
摘要:
A data processing system 2 is provided including a scalar register store 4 and a SIMD register store 20. Dedicated register transfer instructions are provided which serve to move a data value between a selected data element position/lane within a SIMD register of the SIMD register data store 20 and a scalar register within the scalar register store 4. This type of register transfer instruction allows particular data elements to be picked out of and inserted into a SIMD register in a manner which advantageously improves overall efficiency. A further type of register transfer instruction is provided which copies a data value taken from a scalar register into all positions of a specified SIMD register.