摘要:
There is disclosed a data processor that uses bypass circuitry to transfer result data from late pipeline stages to earlier pipeline stages in an efficient manner and with a minimum amount of wiring. The data processor comprises: 1) an instruction execution pipeline comprising a) a read stage; b) a write stage; and c) a first execution stage comprising E execution units that produce data results from data operands. The data processor also comprises: 2) a register file comprising a plurality of data registers, each of the data registers being read by the read stage of the instruction pipeline via at least one of R read ports of the register file and each of the data registers being written by the write stage of the instruction pipeline via at least one of W write ports of the register file; and 3) bypass circuitry for receiving data results from output channels of source devices in at least one of the write stage and the first execution stage, the bypass circuitry comprising a first plurality of bypass tristate line drivers having input channels coupled to first output channels of a first plurality of source devices and tristate output channels coupled to a first common read data channel in the read stage.
摘要:
There is disclosed bundle alignment and dispersal circuitry for use in a data processor. The data processor comprises: 1) C execution clusters, each of the C execution clusters comprising an instruction execution pipeline having N processing stages for executing instruction bundles comprising from one to S syllables, wherein each the instruction execution pipelines is L lanes wide, each of the L lanes for receiving one of the one to S syllables of the instruction bundles; 2) an instruction cache for storing a plurality of cache lines, each of the cache lines comprising C*L syllables; 3) an instruction issue unit for receiving fetched ones of the plurality of cache lines and issuing complete instruction bundles toward the C execution clusters; and 4) alignment and dispersal circuitry for receiving the complete instruction bundles from the instruction issue unit and routing each the received complete instruction bundles to a correct one of the C execution clusters as a function of at least one address bit associated with each of the complete instruction bundles.
摘要:
There is disclosed a data processor that executes variable latency load operations using bypass circuitry that allows load word operations to avoid stalls caused by shifting circuitry. The processor comprises: 1) an instruction execution pipeline comprising N processing stages, each of the N processing stages for performing one of a plurality of execution steps associated with a pending instruction being executed by the instruction execution pipeline; 2) a data cache for storing data values used by the pending instruction; 3) a plurality of registers for receiving the data values from the data cache; 4) a load store unit for transferring a first one of the data values from the data cache to a target one of the plurality of registers during execution of a load operation; 5) a shifter circuit associated with the load store unit for shifting the first data value prior to loading the first data value into the target register; and 6) bypass circuitry associated with the load store unit for transferring the first data value from the data cache directly to the target register without processing the first data value in the shifter circuit.
摘要:
There is disclosed a data processor that executes variable latency load operations using bypass circuitry that allows load word operations to avoid stalls caused by shifting circuitry. The processor comprises: 1) an instruction execution pipeline comprising N processing stages, each of the N processing stages for performing one of a plurality of execution steps associated with a pending instruction being executed by the instruction execution pipeline; 2) a data cache for storing data values used by the pending instruction; 3) a plurality of registers for receiving the data values from the data cache; 4) a load store unit for transferring a first one of the data values from the data cache to a target one of the plurality of registers during execution of a load operation; 5) a shifter circuit associated with the load store unit for shifting the first data value prior to loading the first data value into the target register; and 6) bypass circuitry associated with the load store unit for transferring the first data value from the data cache directly to the target register without processing the first data value in the shifter circuit.
摘要:
There is disclosed bundle alignment and dispersal circuitry for use in a data processor. The data processor comprises: 1) C execution clusters, each of the C execution clusters comprising an instruction execution pipeline having N processing stages for executing instruction bundles comprising from one to S syllables, wherein each the instruction execution pipelines is L lanes wide, each of the L lanes for receiving one of the one to S syllables of the instruction bundles; 2) an instruction cache for storing a plurality of cache lines, each of the cache lines comprising C*L syllables; 3) an instruction issue unit for receiving fetched ones of the plurality of cache lines and issuing complete instruction bundles toward the C execution clusters; and 4) alignment and dispersal circuitry for receiving the complete instruction bundles from the instruction issue unit and routing each the received complete instruction bundles to a correct one of the C execution clusters as a function of at least one address bit associated with each of the complete instruction bundles.