摘要:
An instruction processor system for decoding compound instructions created from a series of base instructions of a scalar machine, the processor generating a series of compound instructions with an instruction format text having appended control bits in the instruction format text enabling the execution of the compound instruction format text in said instruction processor with a compounding facility which fetches and decodes compound instructions which can be executed as compounded and single instructions by the arithmetic and logic units of the instruction processor while preserving intact the scalar execution of the base instructions of a scalar machine which were originally in storage. The system nullifies any execution of a member instruction unit of a compound instruction upon occurrence of possible conditions, such as branch, which would affect the correctness of recording results of execution of the member instruction unit portion based upon the interrelationship of member units of the compound instruction with other instructions. The resultant series of compounded instructions generally executes in a faster manner than the original format which is preserved due to the parallel nature of the compounded instruction stream which is executed.
摘要:
The parallelism of a multi-pipelined digital computer is enhanced by detection of branch instructions from the execution pipelines and concurrent processing of up to two of the detected instructions in parallel with the operations of the execution pipelines. Certain branch instructions, when detected, are removed altogether from the pipeline, but still processed. The processing is synchronized with the execution pipeline to, first, predict an outcome for detected branch instructions, second, test the conditions for branch instructions at their proper place in the execution sequence to determine whether the predicted outcome was correct, and third, fetch a corrected target instruction if the prediction proves wrong.
摘要:
A digital computer system capable of processing two or more computer instructions in parallel and having a cache storage unit for temporarily storing machine-level computer instructions in their journey from a higher-level storage unit of the computer system to the functional units which process the instructions. The computer system includes an instruction compounding unit located intermediate to the higher-level storage unit and the cache storage unit for analyzing the instructions and adding to each instruction a tag field which indicates whether or not that instruction may be processed in parallel with one or more neighboring instructions in the instruction stream. These tagged instructions are then stored in the cache unit. The computer system further includes a plurality of functional instruction processing units which operate in parallel with one another. The instructions supplied to these functional units are obtained from the cache storage unit. At instruction issue time, the tag fields of the instructions are examined and those tagged for parallel processing are sent to different ones of the functional units in accordance with the codings of their operation code fields.
摘要:
A system for processing a sequence of instructions has a set of compounding rules based on an analysis of existing instructions to separate them into different classes. The analysis determines which instructions qualify, either with instructions in their own class or with instructions in other classes, for parallel execution in a particular hardware configuration. Such compounding rules are used as a standard for pre-processing an instruction stream in order to look for groups of two or more adjacent scalar instructions that can be executed in parallel.
摘要:
A digital computer system is described which is capable of processing 2 or more computer instructions in parallel and which has the capability of generating compounding tag information for those instructions, the compounding tag information being associated with instructions for the purpose of indicating groups of instructions which are to be concurrently executed. A compounding tag has a value which indicates the size of the group of instructions which are to be concurrently executed. The computer system includes a hierarchially-arranged memory which provides instructions to a CPU for execution. The instructions are compounded in the memory, and provision is made in the memory for storage of their compounding tags. In the event of modification of an instruction in memory, the invention provides for reduction of the value of the compounding tags for the modified instruction and instructions which are capable of being compounded with the modified instruction or for generation of new tag values for the modified instruction and instructions which are adjacent it in memory.
摘要:
Generation of functional status followed by the use of the status to control the sequencing of microinstructions is a well known critical path in processor designs. The delay associated with the path is exacerbated in superscalar machines by the additional statuses that are produced by multiple functional units from which the appropriate status must be selected for controlling the sequencing of microinstructions. This is especially true in horizontally microcoded machines. The adverse affects on the delay can be reduced by using a staged multiplexor design. For the staged multiplexor to be useful, all functional unit status should be produced as early as possible. In this invention, a status predictor is described that allows the status associated with the shifter to be generated directly from the inputs to the shifter. As a result, the status is available early in the pipeline cycle in which the shift is actually performed and made available to the multiplexor producing the controls for microinstruction sequencing. In addition, the invention allows the early generation of all shifter status used to set condition codes. The predictor has been implemented in an ESA/390 processor implementation where it was instrumental in achieving the desired cycle time.
摘要:
Scalable compound instruction set machine and method which provides for processing a set of instructions or program to be executed by a computer to determine statically which instructions may be combined into compound instructions which are executed in parallel by a scalar machine. Such processing looks for classes of instructions that can be executed in parallel without data-dependent or hardware-dependent interlocks. Without regard to their original sequence the individual instructions are combined with one or more other individual instructions to form a compound instruction which eliminates interlocks. Control information is appended to identify information relevant to the execution of the compound instructions. The result is a stream of scalar instructions compounded or grouped together before instruction decode time so that they are already flagged and identified for selective simultaneous parallel execution by execution units. The compounding does not change the object code results and existing programs realize performance improvements while maintaining compatibility with previously implemented systems for which the original set of instructions was provided.
摘要:
A digital computer system capable of processing two or more computer instructions in parallel and having a cache storage unit for temporarily storing machine-level computer instructions in their journey from a higher-level storage unit of the computer system to the functional units which process the instructions. The computer system includes an instruction compounding unit located intermediate to the higher-level storage unit and the cache storage unit for analyzing the instructions and generating for to each instruction a compounding information which indicates whether or not that instruction may be processed in parallel with one or more neighboring instructions in the instruction stream. These tagged instructions are then stored in the cache unit with the compounding information. The computer system further includes a plurality of functional instruction processing units which operate in parallel with one another. The instructions supplied to these functional units are obtained from the cache storage unit. At instruction issue time, the compounding information for the instructions is examined and those instructions indicated for parallel processing are sent to different ones of the functional units in accordance with the codings of their operation code fields.
摘要:
A digital computer system capable of processing two or more computer instructions in parallel and having a cache storage unit for temporarily storing machine-level computer instructions in their journey from a higher-level storage unit of the computer system to the functional units which process the instructions. The computer system includes an instruction compounding unit located intermediate to the higher-level storage unit and the cache storage unit for analyzing the instructions and generating for to each instruction a compounding information which indicates whether or not that instruction may be processed in parallel with one or more neighboring instructions in the instruction stream. These tagged instructions are then stored in the cache unit with the compounding information. The computer system further includes a plurality of functional instruction processing units which operate in parallel with one another. The instructions supplied to these functional units are obtained from the cache storage unit. At instruction issue time, the compounding information for the instructions is examined and those instructions indicated for parallel processing are sent to different ones of the functional units in accordance with the codings of their operation code fields.
摘要:
A computer system having a plurality of processing resources, including a sub-system for scheduling and dispatching processing jobs to a plurality of hardware accelerators, the subsystem further comprising a job requestor, for requesting jobs having bounded and varying latencies to be executed on the hardware accelerators; a queue controller to manage processing job requests directed to a plurality of hardware accelerators; and multiple hardware queues for dispatching jobs to the plurality of hardware acceleration engines, each queue having a dedicated head of queue entry, dynamically sharing a pool of queue entries, having configurable queue depth limits, and means for removing one or more jobs across all queues.