摘要:
A technique for processing memory access exceptions along with pre-fetched instructions in a pipelined instruction processing computer system is based upon the concept of pipelining exception information along with other parts of the instruction being executed. In response to the detection of access exceptions at a pipeline stage, corresponding fault information is generated and transferred along the pipeline. The fault information is acted upon only when the instruction reaches the execution stage of the pipeline. Each stage of the instruction pipeline is ported into the front end of a memory unit adapted to perform the virtual-to-physical address translation; each port being provided with storage for virtual addresses accompanying an instruction as well as storage for corresponding fault information. When a memory access exception is encountered at the front end of the memory unit, the fault information generated therefrom is loaded into the storage and the port is prevented from accepting further references.
摘要:
A method is provided for preprocessing multiple instructions prior to execution of such instructions in a digital computer having an instruction decoder, an instruction execution unit, and multiple general purpose registers which are read to produce memory addresses during the preprocessing. The method comprises: (1) avoiding the preprocessing of a current instruction to read a general purpose register to produce a memory address prior to the modification of the contents of that register by a preceding instruction by (a) generating a composite write mask having a bit set for each general purpose register whose contents are to be modified by at least one of a plurality of decoded by not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register to be read by the current instruction is a register having a bit set in the write mask, and/or (2) avoiding the preprocessing of a current instruction which modifies the contents of a general purpose register that is to be read by a preceding instruction by (a) generating a composite read mask having a bit set for each general purpose register to be read by at least one of a plurality of decoded but not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register whose contents are to be modified by the current instruction is a register having a bit set in the read mask.
摘要:
To execute variable-length instructions independently of instruction preprocessing, a central processing unit is provided with a set of queues in the data and control paths between an instruction unit and an execution unit. The queues include a "fork" queue, a source queue, a destination queue, and a program counter queue. The fork queue contains an entry of control information for each instruction processed by the instruction unit. This control information corresponds to the opcode for the instruction, and preferably it is a microcode "fork" address at which a microcode execution unit begins execution to execute the instruction. The source queue specifies the source operands for the instruction. Preferably the source queue stores source pointers and the operands themselves are included in a separate "source list" in the case of operands fetched from memory or immediate data from the instruction stream, or are the contents of a set of general purpose registers in the execution unit. The destination queue specifies the destination for the instruction, for example, either memory or general purpose registers. The program counter queue contains the starting value of the program counter for each of the instructions passed from the instruction unit to the execution unit. Preferably the queues are large enough to hold control information and data for up to six instructions. The queues therefore shield the execution unit and the instruction unit from each others complexities and provide a buffer to allow for an uneven processing rate in either of them.
摘要:
A branch prediction is made by searching a cache memory for branch history information associated with a branch instruction. If associated information is not found in the cache, then the branch is predicted based on a predetermined branch bias for the branch instruction's opcode; otherwise, the branch is predicted based upon the associated information from the cache. The associated information in the cache preferably includes a length, displacement, and target address in addition to a prediction bit. If the cache includes associated information predicting that the branch will be taken, the target address from cache is used so long as the associated length and displacement match and the length and displacement for the branch instruction; otherwise, the target address must be computed.
摘要:
In a computer system, the flow of data from the execution unit to the cache 28 is enhanced by pairing individual, sequential longword write operations into a simultaneous quadword write operation. Primary and secondary writebuffers 50, 52 sequentially receive the individual longwords during first and second clock cycles and simultaneously present the individual longwords over a quadword wide bus to the cache 28. During the first clock cycle, when the cache 28 is not performing the quadword write operation, the cache 28 is free to perform the requisite lookup routine on the address of the first longword of data to determine if the quadword of address space is available in the cache. Thus, the flow of data to the cache 28 is maximized.
摘要:
In a multiprocessor system, an error occurring in any one of the CPUs may have an impact upon the operation of the remaining CPUs, and therefore these errors must be handled quickly. The errors are grouped into two categories: synchronous errors (those that must be corrected immediately to allow continued processing of the current instruction); and asynchronous errors (those errors that do not affect execution of the current instruction and may be handled upon completing execution of the current instruction). Since synchronous errors prevent continued execution of the current instruction, it is preferable that the last stable state conditions of the faulting CPU be restored and the faulting instruction reexecuted. These stable state conditions advantageously occur between the execution of each instruction. However, in a pipelined computer system, it is difficult to identify the beginning and ending of a selected instruction since multiple instructions are in process at the same time. Accordingly, the execution unit is selected to be the point of synchronization between error handling and instruction execution. Once the error is indentified as asynchronous or synchronous and the execution unit allows the instruction to complete or rolls back the state conditions to their preinstruction values, error analyzing software examines the condition of the suspect data latches in the CPU. A serial diagnostic link stops the system clock of the CPU and serially loads the CPU data latches into the System Processor Unit for error determination. Thereafter, the CPU system clock is restarted and the CPU resumes execution.
摘要:
In a pipeline processor, simultaneous decoding of multiple specifiers in a variable-length instruction causes a peculiar problem of an intra-instruction read conflict that occurs whenever an instruction includes an autoincrement or an autodecrement specifier which references either directly or indirectly a register specified by a previously occurring specifier for the current instruction. To avoid stalls during the preprocessing of instructions by the instruction unit, register pointers rather than register data are usually passed to the excellent unit because register data is not always available at the time of instruction decoding. If an intra-instruction read conflict exists, however, the operand value specified by the conflicting register specifier is the initial value of the register being incremented or decremented, and this initial value will have been changed by the time that the execution unit executes the instruction. Preferably, the proper initial value is obtained prior to the incrementing or decrementing of the conflicting register by putting the instruction decoder into a special IRC mode in which only one specifier is decoded per cycle, and if a specifier being decoded is a register specifier, the content of the specified register is transmitted to the execution unit. Circuitry for detecting an intra-instruction read conflict is disclosed as well as an efficient method for handling interrupts, exceptions and flushes that may occur during the processing of an instruction having an intra-instruction read conflict.
摘要:
To increase the performance of a pipelined processor executing various classes of instructions, the classes of instructions are executed by respective functional units which are independently controlled and operated in parallel. The classes of instructions include integer instructions, floating point instructions, multiply instructions, and divide instructions. The integer unit, which also performs shift operations, is controlled by the microcode execution unit to handle the wide variety of integer and shift operations included in a complex, variable-length instruction set. The other functional units need only accept a control command to initiate the operation to be performed by the functional unit. The retiring of the results of the instructions need not be controlled by the microcode execution unit, but instead is delegated to a separate retire unit that services a result queue. When the microcode execution unit determines that a new operation is required, an entry is inserted into the result queue. The entry includes all the information needed by the retire unit to retire the result once the result is available from the respective functional unit. The retire unit services the result queue by reading a tag in the entry at the head of the queue to determine the functional unit that is to provide the result. Once the result is available and the destination specified by the entry is also available, the result is retired in accordance with the entry, and the entry is removed from the queue.
摘要:
A method and apparatus for providing data communication between stations on a network which optimizes the amount of resources required for a network switch. A first data frame is encoded with a source station identifier for the first station and a source switch identifier for the first switch. The first data frame is sent from the first switch to the second switch. A station list in the second switch is updated to indicate that the first station is associated with the first switch. Subsequent data frames having the same destination as the first switch are sent directly to the second switch. Any switch on the network need only identify the local ports attached to the switch, plus the number of switches on the network. The task of identifying all of the ports on the network is distributed across all switches on the network.
摘要:
A method and apparatus for providing data communication between a source station having multiple connections to a first switch and a destination station having multiple connections to a second switch. A trunk identifier to each port on the first switch and each port on the second switch. A data frame is encoded with the trunk identifier for an ingress port on the first switch. The data frame is sent to the second switch from the first switch. A list of egress ports for the destination station is obtained from a station list contained in the second switch. An egress port is selected from the list of egress ports based upon the source address, destination address and trunk identifier. The data frame is sent to the destination station through the selected egress port.