摘要:
A method is provided for preprocessing multiple instructions prior to execution of such instructions in a digital computer having an instruction decoder, an instruction execution unit, and multiple general purpose registers which are read to produce memory addresses during the preprocessing. The method comprises: (1) avoiding the preprocessing of a current instruction to read a general purpose register to produce a memory address prior to the modification of the contents of that register by a preceding instruction by (a) generating a composite write mask having a bit set for each general purpose register whose contents are to be modified by at least one of a plurality of decoded by not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register to be read by the current instruction is a register having a bit set in the write mask, and/or (2) avoiding the preprocessing of a current instruction which modifies the contents of a general purpose register that is to be read by a preceding instruction by (a) generating a composite read mask having a bit set for each general purpose register to be read by at least one of a plurality of decoded but not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register whose contents are to be modified by the current instruction is a register having a bit set in the read mask.
摘要:
An instruction buffer of a high speed digital computer controls the flow of instruction stream to an instruction decoder. The buffer provides the decoder with nine bytes of sequential instruction stream. The instruction set used by the computer is of the variable length type, such that the decoder consumes a variable number of the instruction stream bytes, depending upon the type of instruction being decoded. As each instruction is consumed, a shifter removes the consumed bytes and repositions the remaining bytes into the lowest order positions. The byte positions left empty by the shifter are filled by instruction stream retrieved from one of a pair of prefetch buffers (IBEX, IBEX2) or from a virtual instruction cache. These prefetch buffers are arranged to hold the next two subsequent quadwords of instruction stream and provide the desired missing bytes. The IBEX prefetch buffer is filled from the instruction cache after being emptied, but prior to those particular bytes being requested to fill the instruction decoder. This two level prefetching allows the relatively slow process of cache access to be performed during noncritical time. The instruction decoder is not stalled, waiting for a cache refill, but can ordinarily obtain the desired bytes of instruction stream from the prefetch buffer.
摘要:
In the field of high speed computers it is common for a central processing unit to reference memory locations via a virtual addressing scheme, rather than by the actual physical memory addresses. In a multi-tasking environment, this virtual addressing scheme reduces the possibility of different programs accessing the same physical memory location. Thus, to maintain computer processing speed, a high speed translation buffer cache is employed to perform the necessary virtual-to-physical conversions for memory reference instructions. The translation buffer cache stores a number of previously translated virtual addresses and their corresponding physical addresses. A memory management processor is employed to update the translation buffer cache with the most recently accessed physical memory locations. The memory management processor consists of a state machine controlling hardware specifically designed for the purpose of updating the translation buffer cache. The memory management processor calculates an address of a location in the memory where the physical address is stored concurrently with the translation buffer cache comparing the virtual address with already stored virtual addresses. With this arrangement the memory management unit can immediately access memory to retrieve the physical address upon a "miss" by the translation buffer cache.
摘要:
A technique for processing memory access exceptions along with pre-fetched instructions in a pipelined instruction processing computer system is based upon the concept of pipelining exception information along with other parts of the instruction being executed. In response to the detection of access exceptions at a pipeline stage, corresponding fault information is generated and transferred along the pipeline. The fault information is acted upon only when the instruction reaches the execution stage of the pipeline. Each stage of the instruction pipeline is ported into the front end of a memory unit adapted to perform the virtual-to-physical address translation; each port being provided with storage for virtual addresses accompanying an instruction as well as storage for corresponding fault information. When a memory access exception is encountered at the front end of the memory unit, the fault information generated therefrom is loaded into the storage and the port is prevented from accepting further references.
摘要:
A main memory and cache suitable for scalar processing are used in connection with a vector processor by issuing prefetch requests in response to the recognition of a vector load instruction. A respective prefetch request is issued for each block containing an element of the vector to be loaded from memory. In response to a prefetch request, the cache is checked for a "miss" and if the cache does not include the required block, a refill request is sent to the main memory. The main memory is configured into a plurality of banks and has a capability of processing multiple references. Therefore the different banks can be referenced simultaneously to prefetch multiple blocks of vector data. Preferably a cache bypass is provided to transmit data directly to the vector processor as the data from the main memory are being stored in the cache. In a preferred embodiment, a vector processor is added to a digital computing system including a scalar processor, a virtual address translation buffer, a main memory and a cache. The scalar processor includes a microcode interpreter which sends a vector load command to the vector processing unit and which also generates vector prefetch requests. The addresses for the data blocks to be prefetched are computed based upon the vector address, the length of the vector and the "stride" or spacing between the addresses of the elements of the vector.
摘要:
A branch prediction is made by searching a cache memory for branch history information associated with a branch instruction. If associated information is not found in the cache, then the branch is predicted based on a predetermined branch bias for the branch instruction's opcode; otherwise, the branch is predicted based upon the associated information from the cache. The associated information in the cache preferably includes a length, displacement, and target address in addition to a prediction bit. If the cache includes associated information predicting that the branch will be taken, the target address from cache is used so long as the associated length and displacement match and the length and displacement for the branch instruction; otherwise, the target address must be computed.
摘要:
In a pipelined computer system 10, memory access functions (requests) are simultaneously generated from a plurality of different locations. These multiple requests are passed through a multiplexer 50 according to a prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. In this manner, the complex task of converting virtual-to-physical addresses is accomplished for all memory access requests by a single translation buffer 30. The physical address output from the translation buffer 30 are passed to a cache 28 through a second multiplexer 40 according to a second prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. The first and second prioritization schemes differ, in that the memory is capable of handling other requests while a higher priority "miss" is pending. Thus, the prioritization scheme temporarily suspends the higher priority request while the desired data is being retrieved from main memory 14, but continues to operate on a lower priority request so that the overall operation will be enhanced if the lower priority request hits in the cache 28.
摘要:
In a computer system, the flow of data from the execution unit to the cache 28 is enhanced by pairing individual, sequential longword write operations into a simultaneous quadword write operation. Primary and secondary writebuffers 50, 52 sequentially receive the individual longwords during first and second clock cycles and simultaneously present the individual longwords over a quadword wide bus to the cache 28. During the first clock cycle, when the cache 28 is not performing the quadword write operation, the cache 28 is free to perform the requisite lookup routine on the address of the first longword of data to determine if the quadword of address space is available in the cache. Thus, the flow of data to the cache 28 is maximized.
摘要:
A vector register provides the capability for simultaneously writing through at least two write ports and simultaneous reading from at least two read ports. In addition, a barber pole technique for storing words from logical vector registers into banks is provided to minimize conflicts.
摘要:
A vector register provides the capability for simultaneously writing through at least two write ports and simultaneously reading from at least two read ports. In addition, a barber pole technique for storing words from logical vector registers into banks is provided to minimize conflicts.