摘要:
To increase the performance of a pipelined processor executing various classes of instructions, the classes of instructions are executed by respective functional units which are independently controlled and operated in parallel. The classes of instructions include integer instructions, floating point instructions, multiply instructions, and divide instructions. The integer unit, which also performs shift operations, is controlled by the microcode execution unit to handle the wide variety of integer and shift operations included in a complex, variable-length instruction set. The other functional units need only accept a control command to initiate the operation to be performed by the functional unit. The retiring of the results of the instructions need not be controlled by the microcode execution unit, but instead is delegated to a separate retire unit that services a result queue. When the microcode execution unit determines that a new operation is required, an entry is inserted into the result queue. The entry includes all the information needed by the retire unit to retire the result once the result is available from the respective functional unit. The retire unit services the result queue by reading a tag in the entry at the head of the queue to determine the functional unit that is to provide the result. Once the result is available and the destination specified by the entry is also available, the result is retired in accordance with the entry, and the entry is removed from the queue.
摘要:
To execute variable-length instructions independently of instruction preprocessing, a central processing unit is provided with a set of queues in the data and control paths between an instruction unit and an execution unit. The queues include a "fork" queue, a source queue, a destination queue, and a program counter queue. The fork queue contains an entry of control information for each instruction processed by the instruction unit. This control information corresponds to the opcode for the instruction, and preferably it is a microcode "fork" address at which a microcode execution unit begins execution to execute the instruction. The source queue specifies the source operands for the instruction. Preferably the source queue stores source pointers and the operands themselves are included in a separate "source list" in the case of operands fetched from memory or immediate data from the instruction stream, or are the contents of a set of general purpose registers in the execution unit. The destination queue specifies the destination for the instruction, for example, either memory or general purpose registers. The program counter queue contains the starting value of the program counter for each of the instructions passed from the instruction unit to the execution unit. Preferably the queues are large enough to hold control information and data for up to six instructions. The queues therefore shield the execution unit and the instruction unit from each others complexities and provide a buffer to allow for an uneven processing rate in either of them.
摘要:
In a multiprocessor system, an error occurring in any one of the CPUs may have an impact upon the operation of the remaining CPUs, and therefore these errors must be handled quickly. The errors are grouped into two categories: synchronous errors (those that must be corrected immediately to allow continued processing of the current instruction); and asynchronous errors (those errors that do not affect execution of the current instruction and may be handled upon completing execution of the current instruction). Since synchronous errors prevent continued execution of the current instruction, it is preferable that the last stable state conditions of the faulting CPU be restored and the faulting instruction reexecuted. These stable state conditions advantageously occur between the execution of each instruction. However, in a pipelined computer system, it is difficult to identify the beginning and ending of a selected instruction since multiple instructions are in process at the same time. Accordingly, the execution unit is selected to be the point of synchronization between error handling and instruction execution. Once the error is indentified as asynchronous or synchronous and the execution unit allows the instruction to complete or rolls back the state conditions to their preinstruction values, error analyzing software examines the condition of the suspect data latches in the CPU. A serial diagnostic link stops the system clock of the CPU and serially loads the CPU data latches into the System Processor Unit for error determination. Thereafter, the CPU system clock is restarted and the CPU resumes execution.
摘要:
A system for subtracting two floating-point binary numbers in a pipelined floating-point adder/subtractor by aligning the two fractions for sustraction; arbitrarily designating the fraction of one of the two floating-point numbers as the subtrahend, and producing the complement of that designated fraction; adding that complement to the other fraction, normalizing the result; determining whether the result is negative and, if it is, producing the complement of the normalized result; and selecting the larger of the exponents of the two floating-point numbers, and adjusting the value of the selected exponent in accordance with the normalization of the result. The preferred system produces a sticky bit signal by aligning the two fractions for subtraction by shifting one of the two fractions to the right; determining the number of consecutive zeros in the one fraction, prior to the shifting thereof, beginning at the least significant bit position; comparing (1) the number of positions the one fraction is shifted in the aligning step, with (2) the number of consecutive zeros in the one fraction; and producing a sticky bit signal when the number of consecutive zeros is less than the number of positions the one fraction is shifted in the aligning stgep, ther sticky bit signal indicating the truncation of at least one set bit during the aligning step.
摘要:
A microcode control system for a digital data processor. The processor sequentially processes data in response to a microinstruction in a data processing path including a plurality of successive processing stages. A control path parallels the data processing path and includes a plurality of stage which transfer the microinstruction in synchronism with the transfer of data through the data processing path. At each stage in the control path, the microinstruction is decoded to determine the operation to be performed in response thereto on the data by the stage in the data processing path, and control signals are generated to control the processing by the stage in the data processing path.
摘要:
A self time register (STREG) 44 is constructed on a single custom ECL integrated circuit and has provisions for generating its own internal clock signal. The STREG 44 includes a set of latches 80a-80q for temporarily storing the data delivered thereto concurrent with the system clock pulse. Thereafter, the internally generated clock pulse (W.sub.PULS) controls the write operation of the temporary latches into the STREG 44. The STREG has data storage registers including bit storage cells which receive the data in response to the internally generated clock pulse. To selectively output the data, the bit storage cells have emitter-coupled output selectors, and the output selectors for common bits share a common current sink and a common pull-up resistor at which a single-bit output signal is provided from a selected register. Preferably, each bit storage cell has a first output selector for a first data output port, and a second output selector for a second data output port. By sharing of a common pull-up resistor and a current sink for each bit position of each output port, an economy of components can be realized.
摘要:
In a digital processor performing division, quotient accumulation apparatus is formed of a set of muxes and a single carry save adder. Partial quotients are accumulated in carry-save form with proper sign extension. Delay of partial quotient bit fragments from one iteration to a following iteration enables the apparatus to limit use to one carry save adder. By enlarging minimal logic, the quotient accumulation apparatus operates at a rate fast enough to support the rate of fast dividers.