摘要:
An instruction decoder generates implied specifiers for certain predefined instructions, and an operand processing unit preprocess most of the implied specifiers in the same fashion as express operand specifiers. For instructions having an implied autoincrement or autodecrement of the stack pointer, an implied read or write access type is assigned to the instruction and the decode logic is configured accordingly. When an opcode is decoded and is found to have an implied write specifier, a destination operand is created for autodecrementing the stack pointer. If an opcode is decoded and found to have an implied read specifier, a source operand is created for autoincrementing the stack pointer. A register or short literal specifier can be decoded simultaneously with the generation of the implied operand. Therefore some common instructions such as "PUSH Rx" can be decoded in a single cycle. The preprocessing of implied specifiers in addition permits more complex instructions such as "BSR DEST" to be executed in a single cycle. Conflicts created by the implied specifiers are handled in the same manner as conflicts for express specifiers. Moreover, by using the same data paths for both the implied specifiers and the express specifiers, and by inserting queues between the instruction unit and the execution unit, performance gains are realized for instructions having implied specifiers as well as just express specifiers.
摘要:
In a pipeline processor, simultaneous decoding of multiple specifiers in a variable-length instruction causes a peculiar problem of an intra-instruction read conflict that occurs whenever an instruction includes an autoincrement or an autodecrement specifier which references either directly or indirectly a register specified by a previously occurring specifier for the current instruction. To avoid stalls during the preprocessing of instructions by the instruction unit, register pointers rather than register data are usually passed to the excellent unit because register data is not always available at the time of instruction decoding. If an intra-instruction read conflict exists, however, the operand value specified by the conflicting register specifier is the initial value of the register being incremented or decremented, and this initial value will have been changed by the time that the execution unit executes the instruction. Preferably, the proper initial value is obtained prior to the incrementing or decrementing of the conflicting register by putting the instruction decoder into a special IRC mode in which only one specifier is decoded per cycle, and if a specifier being decoded is a register specifier, the content of the specified register is transmitted to the execution unit. Circuitry for detecting an intra-instruction read conflict is disclosed as well as an efficient method for handling interrupts, exceptions and flushes that may occur during the processing of an instruction having an intra-instruction read conflict.
摘要:
A method is provided for preprocessing multiple instructions prior to execution of such instructions in a digital computer having an instruction decoder, an instruction execution unit, and multiple general purpose registers which are read to produce memory addresses during the preprocessing. The method comprises: (1) avoiding the preprocessing of a current instruction to read a general purpose register to produce a memory address prior to the modification of the contents of that register by a preceding instruction by (a) generating a composite write mask having a bit set for each general purpose register whose contents are to be modified by at least one of a plurality of decoded by not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register to be read by the current instruction is a register having a bit set in the write mask, and/or (2) avoiding the preprocessing of a current instruction which modifies the contents of a general purpose register that is to be read by a preceding instruction by (a) generating a composite read mask having a bit set for each general purpose register to be read by at least one of a plurality of decoded but not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register whose contents are to be modified by the current instruction is a register having a bit set in the read mask.
摘要:
A technique for processing memory access exceptions along with pre-fetched instructions in a pipelined instruction processing computer system is based upon the concept of pipelining exception information along with other parts of the instruction being executed. In response to the detection of access exceptions at a pipeline stage, corresponding fault information is generated and transferred along the pipeline. The fault information is acted upon only when the instruction reaches the execution stage of the pipeline. Each stage of the instruction pipeline is ported into the front end of a memory unit adapted to perform the virtual-to-physical address translation; each port being provided with storage for virtual addresses accompanying an instruction as well as storage for corresponding fault information. When a memory access exception is encountered at the front end of the memory unit, the fault information generated therefrom is loaded into the storage and the port is prevented from accepting further references.
摘要:
A branch prediction is made by searching a cache memory for branch history information associated with a branch instruction. If associated information is not found in the cache, then the branch is predicted based on a predetermined branch bias for the branch instruction's opcode; otherwise, the branch is predicted based upon the associated information from the cache. The associated information in the cache preferably includes a length, displacement, and target address in addition to a prediction bit. If the cache includes associated information predicting that the branch will be taken, the target address from cache is used so long as the associated length and displacement match and the length and displacement for the branch instruction; otherwise, the target address must be computed.
摘要:
To execute variable-length instructions independently of instruction preprocessing, a central processing unit is provided with a set of queues in the data and control paths between an instruction unit and an execution unit. The queues include a "fork" queue, a source queue, a destination queue, and a program counter queue. The fork queue contains an entry of control information for each instruction processed by the instruction unit. This control information corresponds to the opcode for the instruction, and preferably it is a microcode "fork" address at which a microcode execution unit begins execution to execute the instruction. The source queue specifies the source operands for the instruction. Preferably the source queue stores source pointers and the operands themselves are included in a separate "source list" in the case of operands fetched from memory or immediate data from the instruction stream, or are the contents of a set of general purpose registers in the execution unit. The destination queue specifies the destination for the instruction, for example, either memory or general purpose registers. The program counter queue contains the starting value of the program counter for each of the instructions passed from the instruction unit to the execution unit. Preferably the queues are large enough to hold control information and data for up to six instructions. The queues therefore shield the execution unit and the instruction unit from each others complexities and provide a buffer to allow for an uneven processing rate in either of them.
摘要:
An operand processing unit delivers a specified address and at least one read/write signal in response to an instruction being a source of destination operand, and delivers the source operand to an execution unit in response to completion of the preprocessing. The execution unit receives the source operand, executes it and delivers the resultant data to memory. A "write queue" receives the write addresses of the destination operands from the operand processing unit, stores the write addresses, and delivers the stored preselected addresses to memory in response to receiving the resultant data corresponding to the preselected address. The addresses of the source operand is compared to the write addresses stored in the write queue, and the operand processing unit is stalled whenever at least one of the write addresses in the write queue is equivalent to the read address. Therefore, fetching of the operand is delayed until the corresponding resultant data has been delivered by the execution unit.
摘要:
An instruction buffer of a high speed digital computer controls the flow of instruction stream to an instruction decoder. The buffer provides the decoder with nine bytes of sequential instruction stream. The instruction set used by the computer is of the variable length type, such that the decoder consumes a variable number of the instruction stream bytes, depending upon the type of instruction being decoded. As each instruction is consumed, a shifter removes the consumed bytes and repositions the remaining bytes into the lowest order positions. The byte positions left empty by the shifter are filled by instruction stream retrieved from one of a pair of prefetch buffers (IBEX, IBEX2) or from a virtual instruction cache. These prefetch buffers are arranged to hold the next two subsequent quadwords of instruction stream and provide the desired missing bytes. The IBEX prefetch buffer is filled from the instruction cache after being emptied, but prior to those particular bytes being requested to fill the instruction decoder. This two level prefetching allows the relatively slow process of cache access to be performed during noncritical time. The instruction decoder is not stalled, waiting for a cache refill, but can ordinarily obtain the desired bytes of instruction stream from the prefetch buffer.
摘要:
A main memory and cache suitable for scalar processing are used in connection with a vector processor by issuing prefetch requests in response to the recognition of a vector load instruction. A respective prefetch request is issued for each block containing an element of the vector to be loaded from memory. In response to a prefetch request, the cache is checked for a "miss" and if the cache does not include the required block, a refill request is sent to the main memory. The main memory is configured into a plurality of banks and has a capability of processing multiple references. Therefore the different banks can be referenced simultaneously to prefetch multiple blocks of vector data. Preferably a cache bypass is provided to transmit data directly to the vector processor as the data from the main memory are being stored in the cache. In a preferred embodiment, a vector processor is added to a digital computing system including a scalar processor, a virtual address translation buffer, a main memory and a cache. The scalar processor includes a microcode interpreter which sends a vector load command to the vector processing unit and which also generates vector prefetch requests. The addresses for the data blocks to be prefetched are computed based upon the vector address, the length of the vector and the "stride" or spacing between the addresses of the elements of the vector.
摘要:
In the field of high speed computers it is common for a central processing unit to reference memory locations via a virtual addressing scheme, rather than by the actual physical memory addresses. In a multi-tasking environment, this virtual addressing scheme reduces the possibility of different programs accessing the same physical memory location. Thus, to maintain computer processing speed, a high speed translation buffer cache is employed to perform the necessary virtual-to-physical conversions for memory reference instructions. The translation buffer cache stores a number of previously translated virtual addresses and their corresponding physical addresses. A memory management processor is employed to update the translation buffer cache with the most recently accessed physical memory locations. The memory management processor consists of a state machine controlling hardware specifically designed for the purpose of updating the translation buffer cache. The memory management processor calculates an address of a location in the memory where the physical address is stored concurrently with the translation buffer cache comparing the virtual address with already stored virtual addresses. With this arrangement the memory management unit can immediately access memory to retrieve the physical address upon a "miss" by the translation buffer cache.