摘要:
The parallelism of a multi-pipelined digital computer is enhanced by detection of branch instructions from the execution pipelines and concurrent processing of up to two of the detected instructions in parallel with the operations of the execution pipelines. Certain branch instructions, when detected, are removed altogether from the pipeline, but still processed. The processing is synchronized with the execution pipeline to, first, predict an outcome for detected branch instructions, second, test the conditions for branch instructions at their proper place in the execution sequence to determine whether the predicted outcome was correct, and third, fetch a corrected target instruction if the prediction proves wrong.
摘要:
An instruction processor system for decoding compound instructions created from a series of base instructions of a scalar machine, the processor generating a series of compound instructions with an instruction format text having appended control bits in the instruction format text enabling the execution of the compound instruction format text in said instruction processor with a compounding facility which fetches and decodes compound instructions which can be executed as compounded and single instructions by the arithmetic and logic units of the instruction processor while preserving intact the scalar execution of the base instructions of a scalar machine which were originally in storage. The system nullifies any execution of a member instruction unit of a compound instruction upon occurrence of possible conditions, such as branch, which would affect the correctness of recording results of execution of the member instruction unit portion based upon the interrelationship of member units of the compound instruction with other instructions. The resultant series of compounded instructions generally executes in a faster manner than the original format which is preserved due to the parallel nature of the compounded instruction stream which is executed.
摘要:
An instruction processor system for decoding compound instructions created from a series of base instructions of a scalar machine, the processor generating a series of compound instructions with an instruction format text having appended control bits in the instruction format text enabling the execution of the compound instruction format text in said instruction processor with a compounding facility which fetches and decodes compound instructions which can be executed as compounded and single instructions by the arithmetic and logic units of the instruction processor while preserving intact the scalar execution of the base instructions of a scalar machine which were originally in storage. The system nullifies any execution of a member instruction unit of a compound instruction upon occurrence of possible conditions, such as branch, which would affect the correctness of recording results of execution of the member instruction unit portion based upon the interrelationship of member units of the compound instruction with other instructions. The resultant series of compounded instructions generally executes in a faster manner than the original format which is preserved due to the parallel nature of the compounded instruction stream which is executed.
摘要:
A digital computer system capable of processing two or more computer instructions in parallel and having a cache storage unit for temporarily storing machine-level computer instructions in their journey from a higher-level storage unit of the computer system to the functional units which process the instructions. The computer system includes an instruction compounding unit located intermediate to the higher-level storage unit and the cache storage unit for analyzing the instructions and adding to each instruction a tag field which indicates whether or not that instruction may be processed in parallel with one or more neighboring instructions in the instruction stream. These tagged instructions are then stored in the cache unit. The computer system further includes a plurality of functional instruction processing units which operate in parallel with one another. The instructions supplied to these functional units are obtained from the cache storage unit. At instruction issue time, the tag fields of the instructions are examined and those tagged for parallel processing are sent to different ones of the functional units in accordance with the codings of their operation code fields.
摘要:
Apparatus for retaining the branch prediction bits of a line displaced from an integrated cache/branch history table and using the retained bits to initialize the prediction bits should that line be brought back into the cache, the operation of which may be overlapped with the activities normally associated with displacing a cache line with one fetched from memory, thus imposing no instruction processing penalty. The apparatus consists of an associative memory that provides storage for branch prediction bits associated with cache lines and comparison means for matching stored prediction bits with their corresponding cache lines.
摘要:
A digital computer system capable of processing two or more computer instructions in parallel and having a cache storage unit for temporarily storing machine-level computer instructions in their journey from a higher-level storage unit of the computer system to the functional units which process the instructions. The computer system includes an instruction compounding unit located intermediate to the higher-level storage unit and the cache storage unit for analyzing the instructions and generating for to each instruction a compounding information which indicates whether or not that instruction may be processed in parallel with one or more neighboring instructions in the instruction stream. These tagged instructions are then stored in the cache unit with the compounding information. The computer system further includes a plurality of functional instruction processing units which operate in parallel with one another. The instructions supplied to these functional units are obtained from the cache storage unit. At instruction issue time, the compounding information for the instructions is examined and those instructions indicated for parallel processing are sent to different ones of the functional units in accordance with the codings of their operation code fields.
摘要:
Apparatus for retaining the branch prediction bits of a line displaced from an integrated cache/branch history table and using the retained bits to initialize the prediction bits should that line be brought back into the cache, the operation of which may be overlapped with the activities normally associated with displacing a cache line with one fetched from memory, thus imposing no instruction processing penalty. The apparatus consists of an associative memory that provides storage for branch prediction bits associated with cache lines and comparison means for matching stored prediction bits with their corresponding cache lines.
摘要:
A digital computer system capable of processing two or more computer instructions in parallel and having a main memory unit for storing information blocks including the computer instructions includes an instruction compounding unit for analyzing the instructions and adding to each instruction a tag field which indicates whether or not that instruction may be processed in parallel with another neighboring instruction. Tagged instructions are stored in the main memory. The computer system further includes a plurality of functional instruction processing units which operate in parallel with one another. The instructions supplied to the functional units are obtained from the memory by way of a cache storage unit. At instruction issue time, the tag fields of the instructions are examined and those tagged for parallel processing are sent to different ones of the functional units in accordance with the codings of their operation code fields.
摘要:
A multi-function ALU (arithmetic/logic unit) for use in digital data processing facilitates the execution of instructions in parallel, thereby enhancing processor performance. The proposed apparatus reduces the instruction execution latency that results from data dependency hazards in a pipelined machine. This latency reduction is accomplished by collapsing the interlocks due to these hazards. The proposed apparatus achieves performance improvement while maintaining compatibility with previous implementations designed using an identical architecture.
摘要:
A multi-function ALU (arithmetic/logic unit) for use in digital data processing facilitates the execution of instructions in parallel, thereby enhancing processor performance. The proposed apparatus reduces the instruction execution latency that results from data dependency hazards in a pipelined machine. This latency reduction is accomplished by collapsing the interlocks due to these hazards. The proposed apparatus achieves performance improvement while maintaining compatibility with previous implementations designed using an identical architecture.