摘要:
A vector register provides the capability for simultaneously writing through at least two write ports and simultaneously reading from at least two read ports. In addition, a barber pole technique for storing words from logical vector registers into banks is provided to minimize conflicts.
摘要:
A vector register provides the capability for simultaneously writing through at least two write ports and simultaneous reading from at least two read ports. In addition, a barber pole technique for storing words from logical vector registers into banks is provided to minimize conflicts.
摘要:
A main memory and cache suitable for scalar processing are used in connection with a vector processor by issuing prefetch requests in response to the recognition of a vector load instruction. A respective prefetch request is issued for each block containing an element of the vector to be loaded from memory. In response to a prefetch request, the cache is checked for a "miss" and if the cache does not include the required block, a refill request is sent to the main memory. The main memory is configured into a plurality of banks and has a capability of processing multiple references. Therefore the different banks can be referenced simultaneously to prefetch multiple blocks of vector data. Preferably a cache bypass is provided to transmit data directly to the vector processor as the data from the main memory are being stored in the cache. In a preferred embodiment, a vector processor is added to a digital computing system including a scalar processor, a virtual address translation buffer, a main memory and a cache. The scalar processor includes a microcode interpreter which sends a vector load command to the vector processing unit and which also generates vector prefetch requests. The addresses for the data blocks to be prefetched are computed based upon the vector address, the length of the vector and the "stride" or spacing between the addresses of the elements of the vector.
摘要:
A technique for processing memory access exceptions along with pre-fetched instructions in a pipelined instruction processing computer system is based upon the concept of pipelining exception information along with other parts of the instruction being executed. In response to the detection of access exceptions at a pipeline stage, corresponding fault information is generated and transferred along the pipeline. The fault information is acted upon only when the instruction reaches the execution stage of the pipeline. Each stage of the instruction pipeline is ported into the front end of a memory unit adapted to perform the virtual-to-physical address translation; each port being provided with storage for virtual addresses accompanying an instruction as well as storage for corresponding fault information. When a memory access exception is encountered at the front end of the memory unit, the fault information generated therefrom is loaded into the storage and the port is prevented from accepting further references.
摘要:
A data processing system handles memory management exceptions caused by a faulting vector instruction in a vector processor by halting the execution of the faulting vector instruction being executed when the exception occurred and by setting the state information for the vector processor to acknowledge the presence of the exception and to include information about the suboperation of the vector instruction being executed when the exception occurred. The scalar processor is not interrupted at this time, however. Any other vector instructions executing simutaneously with the faulting vector instruction are allowed to continue so long as those instructions do not require data from the faulting instruction. The faulting partially completed vector instruction resumes execution after the operating system has processed the memory management exception.
摘要:
To increase the performance of a pipelined processor executing various classes of instructions, the classes of instructions are executed by respective functional units which are independently controlled and operated in parallel. The classes of instructions include integer instructions, floating point instructions, multiply instructions, and divide instructions. The integer unit, which also performs shift operations, is controlled by the microcode execution unit to handle the wide variety of integer and shift operations included in a complex, variable-length instruction set. The other functional units need only accept a control command to initiate the operation to be performed by the functional unit. The retiring of the results of the instructions need not be controlled by the microcode execution unit, but instead is delegated to a separate retire unit that services a result queue. When the microcode execution unit determines that a new operation is required, an entry is inserted into the result queue. The entry includes all the information needed by the retire unit to retire the result once the result is available from the respective functional unit. The retire unit services the result queue by reading a tag in the entry at the head of the queue to determine the functional unit that is to provide the result. Once the result is available and the destination specified by the entry is also available, the result is retired in accordance with the entry, and the entry is removed from the queue.
摘要:
A hierarchical cache memory includes a high-speed primary cache memory and a lower speed secondary cache memory of greater storage capacity than the primary cache memory. To manage a huge number of data lines interconnecting the primary and secondary cache memories, the hierarchical cache memory is integrated on a plurality of integrated circuits which include all of the interconnecting data lines. Each integrated circuit includes a primary memory and a secondary memory for storing and retrieving data transferred over a first data input line and a first data output line that link the primary memory to a central processing unit. At any given time, a multi-bit word is addressed in the secondary memory, and a corresponding multi-bit word is addressed in the primary memory. The primary and secondary memories are interconnected by a first multi-line bus for transferring a multi-bit word read from the secondary memory to the primary memory, and by a second multi-line bus for transferring a multi-bit word read from the primary memory to the secondary memory. The secondary memory is linked to a main memory by a second data output line and a second data input line for sequential transmission of bits to exchange multi-bit words during a writeback and refill operation. In a preferred embodiment, data inputs of the primary memory and the secondary memory are wired in parallel to a serial-parallel shift register that is used as a common write buffer.
摘要:
To execute variable-length instructions independently of instruction preprocessing, a central processing unit is provided with a set of queues in the data and control paths between an instruction unit and an execution unit. The queues include a "fork" queue, a source queue, a destination queue, and a program counter queue. The fork queue contains an entry of control information for each instruction processed by the instruction unit. This control information corresponds to the opcode for the instruction, and preferably it is a microcode "fork" address at which a microcode execution unit begins execution to execute the instruction. The source queue specifies the source operands for the instruction. Preferably the source queue stores source pointers and the operands themselves are included in a separate "source list" in the case of operands fetched from memory or immediate data from the instruction stream, or are the contents of a set of general purpose registers in the execution unit. The destination queue specifies the destination for the instruction, for example, either memory or general purpose registers. The program counter queue contains the starting value of the program counter for each of the instructions passed from the instruction unit to the execution unit. Preferably the queues are large enough to hold control information and data for up to six instructions. The queues therefore shield the execution unit and the instruction unit from each others complexities and provide a buffer to allow for an uneven processing rate in either of them.
摘要:
A method is provided for preprocessing multiple instructions prior to execution of such instructions in a digital computer having an instruction decoder, an instruction execution unit, and multiple general purpose registers which are read to produce memory addresses during the preprocessing. The method comprises: (1) avoiding the preprocessing of a current instruction to read a general purpose register to produce a memory address prior to the modification of the contents of that register by a preceding instruction by (a) generating a composite write mask having a bit set for each general purpose register whose contents are to be modified by at least one of a plurality of decoded by not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register to be read by the current instruction is a register having a bit set in the write mask, and/or (2) avoiding the preprocessing of a current instruction which modifies the contents of a general purpose register that is to be read by a preceding instruction by (a) generating a composite read mask having a bit set for each general purpose register to be read by at least one of a plurality of decoded but not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register whose contents are to be modified by the current instruction is a register having a bit set in the read mask.
摘要:
In the field of high speed computers it is common for a central processing unit to reference memory locations via a virtual addressing scheme, rather than by the actual physical memory addresses. In a multi-tasking environment, this virtual addressing scheme reduces the possibility of different programs accessing the same physical memory location. Thus, to maintain computer processing speed, a high speed translation buffer cache is employed to perform the necessary virtual-to-physical conversions for memory reference instructions. The translation buffer cache stores a number of previously translated virtual addresses and their corresponding physical addresses. A memory management processor is employed to update the translation buffer cache with the most recently accessed physical memory locations. The memory management processor consists of a state machine controlling hardware specifically designed for the purpose of updating the translation buffer cache. The memory management processor calculates an address of a location in the memory where the physical address is stored concurrently with the translation buffer cache comparing the virtual address with already stored virtual addresses. With this arrangement the memory management unit can immediately access memory to retrieve the physical address upon a "miss" by the translation buffer cache.