摘要:
An operand processing unit delivers a specified address and at least one read/write signal in response to an instruction being a source of destination operand, and delivers the source operand to an execution unit in response to completion of the preprocessing. The execution unit receives the source operand, executes it and delivers the resultant data to memory. A "write queue" receives the write addresses of the destination operands from the operand processing unit, stores the write addresses, and delivers the stored preselected addresses to memory in response to receiving the resultant data corresponding to the preselected address. The addresses of the source operand is compared to the write addresses stored in the write queue, and the operand processing unit is stalled whenever at least one of the write addresses in the write queue is equivalent to the read address. Therefore, fetching of the operand is delayed until the corresponding resultant data has been delivered by the execution unit.
摘要:
A branch prediction is made by searching a cache memory for branch history information associated with a branch instruction. If associated information is not found in the cache, then the branch is predicted based on a predetermined branch bias for the branch instruction's opcode; otherwise, the branch is predicted based upon the associated information from the cache. The associated information in the cache preferably includes a length, displacement, and target address in addition to a prediction bit. If the cache includes associated information predicting that the branch will be taken, the target address from cache is used so long as the associated length and displacement match and the length and displacement for the branch instruction; otherwise, the target address must be computed.
摘要:
To execute variable-length instructions independently of instruction preprocessing, a central processing unit is provided with a set of queues in the data and control paths between an instruction unit and an execution unit. The queues include a "fork" queue, a source queue, a destination queue, and a program counter queue. The fork queue contains an entry of control information for each instruction processed by the instruction unit. This control information corresponds to the opcode for the instruction, and preferably it is a microcode "fork" address at which a microcode execution unit begins execution to execute the instruction. The source queue specifies the source operands for the instruction. Preferably the source queue stores source pointers and the operands themselves are included in a separate "source list" in the case of operands fetched from memory or immediate data from the instruction stream, or are the contents of a set of general purpose registers in the execution unit. The destination queue specifies the destination for the instruction, for example, either memory or general purpose registers. The program counter queue contains the starting value of the program counter for each of the instructions passed from the instruction unit to the execution unit. Preferably the queues are large enough to hold control information and data for up to six instructions. The queues therefore shield the execution unit and the instruction unit from each others complexities and provide a buffer to allow for an uneven processing rate in either of them.
摘要:
A method is provided for preprocessing multiple instructions prior to execution of such instructions in a digital computer having an instruction decoder, an instruction execution unit, and multiple general purpose registers which are read to produce memory addresses during the preprocessing. The method comprises: (1) avoiding the preprocessing of a current instruction to read a general purpose register to produce a memory address prior to the modification of the contents of that register by a preceding instruction by (a) generating a composite write mask having a bit set for each general purpose register whose contents are to be modified by at least one of a plurality of decoded by not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register to be read by the current instruction is a register having a bit set in the write mask, and/or (2) avoiding the preprocessing of a current instruction which modifies the contents of a general purpose register that is to be read by a preceding instruction by (a) generating a composite read mask having a bit set for each general purpose register to be read by at least one of a plurality of decoded but not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register whose contents are to be modified by the current instruction is a register having a bit set in the read mask.
摘要:
A main memory and cache suitable for scalar processing are used in connection with a vector processor by issuing prefetch requests in response to the recognition of a vector load instruction. A respective prefetch request is issued for each block containing an element of the vector to be loaded from memory. In response to a prefetch request, the cache is checked for a "miss" and if the cache does not include the required block, a refill request is sent to the main memory. The main memory is configured into a plurality of banks and has a capability of processing multiple references. Therefore the different banks can be referenced simultaneously to prefetch multiple blocks of vector data. Preferably a cache bypass is provided to transmit data directly to the vector processor as the data from the main memory are being stored in the cache. In a preferred embodiment, a vector processor is added to a digital computing system including a scalar processor, a virtual address translation buffer, a main memory and a cache. The scalar processor includes a microcode interpreter which sends a vector load command to the vector processing unit and which also generates vector prefetch requests. The addresses for the data blocks to be prefetched are computed based upon the vector address, the length of the vector and the "stride" or spacing between the addresses of the elements of the vector.
摘要:
In the field of high speed computers it is common for a central processing unit to reference memory locations via a virtual addressing scheme, rather than by the actual physical memory addresses. In a multi-tasking environment, this virtual addressing scheme reduces the possibility of different programs accessing the same physical memory location. Thus, to maintain computer processing speed, a high speed translation buffer cache is employed to perform the necessary virtual-to-physical conversions for memory reference instructions. The translation buffer cache stores a number of previously translated virtual addresses and their corresponding physical addresses. A memory management processor is employed to update the translation buffer cache with the most recently accessed physical memory locations. The memory management processor consists of a state machine controlling hardware specifically designed for the purpose of updating the translation buffer cache. The memory management processor calculates an address of a location in the memory where the physical address is stored concurrently with the translation buffer cache comparing the virtual address with already stored virtual addresses. With this arrangement the memory management unit can immediately access memory to retrieve the physical address upon a "miss" by the translation buffer cache.
摘要:
An instruction decoder for a pipelined data processing unit simultaneously decodes two source specifiers and one destination specifier. All three of the specifiers can be register specifiers in which the specified operand is the content of a specified register. Any one of the specifiers can be a complex specifier designating an index register, a base register, and a displacement. Any one of the source specifiers can specify short literal data. Data for locating the two source operands and the destination operand are transmitted over parallel buses to an execution unit, so that most instructions are executed at a rate of one instruction per clock cycle. The complex specifier can have a variable length determined by its data type as well as its addressing mode. In particular, the complex specifier may specify a long length of extended immediate data that is received through the instruction buffer over a number of clock cycles.
摘要:
A technique for processing memory access exceptions along with pre-fetched instructions in a pipelined instruction processing computer system is based upon the concept of pipelining exception information along with other parts of the instruction being executed. In response to the detection of access exceptions at a pipeline stage, corresponding fault information is generated and transferred along the pipeline. The fault information is acted upon only when the instruction reaches the execution stage of the pipeline. Each stage of the instruction pipeline is ported into the front end of a memory unit adapted to perform the virtual-to-physical address translation; each port being provided with storage for virtual addresses accompanying an instruction as well as storage for corresponding fault information. When a memory access exception is encountered at the front end of the memory unit, the fault information generated therefrom is loaded into the storage and the port is prevented from accepting further references.
摘要:
In a multiprocessor system, an error occurring in any one of the CPUs may have an impact upon the operation of the remaining CPUs, and therefore these errors must be handled quickly. The errors are grouped into two categories: synchronous errors (those that must be corrected immediately to allow continued processing of the current instruction); and asynchronous errors (those errors that do not affect execution of the current instruction and may be handled upon completing execution of the current instruction). Since synchronous errors prevent continued execution of the current instruction, it is preferable that the last stable state conditions of the faulting CPU be restored and the faulting instruction reexecuted. These stable state conditions advantageously occur between the execution of each instruction. However, in a pipelined computer system, it is difficult to identify the beginning and ending of a selected instruction since multiple instructions are in process at the same time. Accordingly, the execution unit is selected to be the point of synchronization between error handling and instruction execution. Once the error is indentified as asynchronous or synchronous and the execution unit allows the instruction to complete or rolls back the state conditions to their preinstruction values, error analyzing software examines the condition of the suspect data latches in the CPU. A serial diagnostic link stops the system clock of the CPU and serially loads the CPU data latches into the System Processor Unit for error determination. Thereafter, the CPU system clock is restarted and the CPU resumes execution.
摘要:
In a pipelined computer system 10, memory access functions (requests) are simultaneously generated from a plurality of different locations. These multiple requests are passed through a multiplexer 50 according to a prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. In this manner, the complex task of converting virtual-to-physical addresses is accomplished for all memory access requests by a single translation buffer 30. The physical address output from the translation buffer 30 are passed to a cache 28 through a second multiplexer 40 according to a second prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. The first and second prioritization schemes differ, in that the memory is capable of handling other requests while a higher priority "miss" is pending. Thus, the prioritization scheme temporarily suspends the higher priority request while the desired data is being retrieved from main memory 14, but continues to operate on a lower priority request so that the overall operation will be enhanced if the lower priority request hits in the cache 28.