摘要:
A hierarchical cache memory includes a high-speed primary cache memory and a lower speed secondary cache memory of greater storage capacity than the primary cache memory. To manage a huge number of data lines interconnecting the primary and secondary cache memories, the hierarchical cache memory is integrated on a plurality of integrated circuits which include all of the interconnecting data lines. Each integrated circuit includes a primary memory and a secondary memory for storing and retrieving data transferred over a first data input line and a first data output line that link the primary memory to a central processing unit. At any given time, a multi-bit word is addressed in the secondary memory, and a corresponding multi-bit word is addressed in the primary memory. The primary and secondary memories are interconnected by a first multi-line bus for transferring a multi-bit word read from the secondary memory to the primary memory, and by a second multi-line bus for transferring a multi-bit word read from the primary memory to the secondary memory. The secondary memory is linked to a main memory by a second data output line and a second data input line for sequential transmission of bits to exchange multi-bit words during a writeback and refill operation. In a preferred embodiment, data inputs of the primary memory and the secondary memory are wired in parallel to a serial-parallel shift register that is used as a common write buffer.
摘要:
A main memory and cache suitable for scalar processing are used in connection with a vector processor by issuing prefetch requests in response to the recognition of a vector load instruction. A respective prefetch request is issued for each block containing an element of the vector to be loaded from memory. In response to a prefetch request, the cache is checked for a "miss" and if the cache does not include the required block, a refill request is sent to the main memory. The main memory is configured into a plurality of banks and has a capability of processing multiple references. Therefore the different banks can be referenced simultaneously to prefetch multiple blocks of vector data. Preferably a cache bypass is provided to transmit data directly to the vector processor as the data from the main memory are being stored in the cache. In a preferred embodiment, a vector processor is added to a digital computing system including a scalar processor, a virtual address translation buffer, a main memory and a cache. The scalar processor includes a microcode interpreter which sends a vector load command to the vector processing unit and which also generates vector prefetch requests. The addresses for the data blocks to be prefetched are computed based upon the vector address, the length of the vector and the "stride" or spacing between the addresses of the elements of the vector.
摘要:
A technique for processing memory access exceptions along with pre-fetched instructions in a pipelined instruction processing computer system is based upon the concept of pipelining exception information along with other parts of the instruction being executed. In response to the detection of access exceptions at a pipeline stage, corresponding fault information is generated and transferred along the pipeline. The fault information is acted upon only when the instruction reaches the execution stage of the pipeline. Each stage of the instruction pipeline is ported into the front end of a memory unit adapted to perform the virtual-to-physical address translation; each port being provided with storage for virtual addresses accompanying an instruction as well as storage for corresponding fault information. When a memory access exception is encountered at the front end of the memory unit, the fault information generated therefrom is loaded into the storage and the port is prevented from accepting further references.
摘要:
A method is provided for preprocessing multiple instructions prior to execution of such instructions in a digital computer having an instruction decoder, an instruction execution unit, and multiple general purpose registers which are read to produce memory addresses during the preprocessing. The method comprises: (1) avoiding the preprocessing of a current instruction to read a general purpose register to produce a memory address prior to the modification of the contents of that register by a preceding instruction by (a) generating a composite write mask having a bit set for each general purpose register whose contents are to be modified by at least one of a plurality of decoded by not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register to be read by the current instruction is a register having a bit set in the write mask, and/or (2) avoiding the preprocessing of a current instruction which modifies the contents of a general purpose register that is to be read by a preceding instruction by (a) generating a composite read mask having a bit set for each general purpose register to be read by at least one of a plurality of decoded but not-yet-executed instructions preceding the current instruction, and (b) stalling the preprocessing of the current instruction when a general purpose register whose contents are to be modified by the current instruction is a register having a bit set in the read mask.
摘要:
In a multiprocessor system, an error occurring in any one of the CPUs may have an impact upon the operation of the remaining CPUs, and therefore these errors must be handled quickly. The errors are grouped into two categories: synchronous errors (those that must be corrected immediately to allow continued processing of the current instruction); and asynchronous errors (those errors that do not affect execution of the current instruction and may be handled upon completing execution of the current instruction). Since synchronous errors prevent continued execution of the current instruction, it is preferable that the last stable state conditions of the faulting CPU be restored and the faulting instruction reexecuted. These stable state conditions advantageously occur between the execution of each instruction. However, in a pipelined computer system, it is difficult to identify the beginning and ending of a selected instruction since multiple instructions are in process at the same time. Accordingly, the execution unit is selected to be the point of synchronization between error handling and instruction execution. Once the error is indentified as asynchronous or synchronous and the execution unit allows the instruction to complete or rolls back the state conditions to their preinstruction values, error analyzing software examines the condition of the suspect data latches in the CPU. A serial diagnostic link stops the system clock of the CPU and serially loads the CPU data latches into the System Processor Unit for error determination. Thereafter, the CPU system clock is restarted and the CPU resumes execution.
摘要:
An operand processing unit delivers a specified address and at least one read/write signal in response to an instruction being a source of destination operand, and delivers the source operand to an execution unit in response to completion of the preprocessing. The execution unit receives the source operand, executes it and delivers the resultant data to memory. A "write queue" receives the write addresses of the destination operands from the operand processing unit, stores the write addresses, and delivers the stored preselected addresses to memory in response to receiving the resultant data corresponding to the preselected address. The addresses of the source operand is compared to the write addresses stored in the write queue, and the operand processing unit is stalled whenever at least one of the write addresses in the write queue is equivalent to the read address. Therefore, fetching of the operand is delayed until the corresponding resultant data has been delivered by the execution unit.
摘要:
In the operation of high-speed computers, it is frequently advantageous to employ a high speed cache memory within each CPU of a multiple CPU computer system. A standard, slower memory configuration remains in use for the large, common main memory, but those portions of main memory which are expected to be used heavily are copied into the cache memory. Thus, on many memory references, the faster cache memory is exploited, while only infrequent references to the slower main memory are necessary. This configuration generally speeds the overall operation of the computer system; however, memory integrity problems arise by maintaining two separate copies of selected portions of main memory. Accordingly, the memory access unit of the CPU uses error correction code (ECC) hardware to ensure the integrity of the data delivered between the cache and main memory. The prevent the ECC hardware from slowing the overall operation of the CPU, the error correction is performed underneath a write back operation. Data contained in the cache, which will be displaced by data received from main memory 10, is transferred to a write back buffer (WBB) during that period of time between the request for data from the main memory and actual delivery of the requested data. Further, the ECC hardware also operates on the cache data being written to the WBB. Accordingly, a performance penalty is avoided by performing error correction and preremoving the cache data during that idle period of time.
摘要:
In the field of high speed computers it is common for a central processing unit to reference memory locations via a virtual addressing scheme, rather than by the actual physical memory addresses. In a multi-tasking environment, this virtual addressing scheme reduces the possibility of different programs accessing the same physical memory location. Thus, to maintain computer processing speed, a high speed translation buffer cache is employed to perform the necessary virtual-to-physical conversions for memory reference instructions. The translation buffer cache stores a number of previously translated virtual addresses and their corresponding physical addresses. A memory management processor is employed to update the translation buffer cache with the most recently accessed physical memory locations. The memory management processor consists of a state machine controlling hardware specifically designed for the purpose of updating the translation buffer cache. The memory management processor calculates an address of a location in the memory where the physical address is stored concurrently with the translation buffer cache comparing the virtual address with already stored virtual addresses. With this arrangement the memory management unit can immediately access memory to retrieve the physical address upon a "miss" by the translation buffer cache.
摘要:
A method for insuring data consistency between a plurality of individual processor cache memories and the main memory in a multi-processor computer system is provided which is capable of (1) detecting when one of a set of predefined data inconsistency states occurs as a data transaction request is being processed, and (2) correcting the data inconsistency states so that the operation may be executed in a correct and consistent manner. In particular, the method is adapted to address two kinds of data inconsistency states: (1) A request for a operation from a system unit to main memory when the location to be written to is present in the cache of some processor unit-in such a case, data in the cache is "stale" and the data inconsistency is avoided by preventing the associated processor from using the "stale" data; and (2) when a read operation is requested of main memory by a system unit and the location to be read may be written or has already been written in the cache of some processor--in this case, the data in main memory is "stale" and the data inconsistency is avoided by insuring that the data returned to the requesting unit is the updated data in the cache. The presence of one of the above-described data inconsistency states is detected in a SCU-based multi-processing system by providing the SCU with means for maintaining a copy of the cache directories for each of the processor caches. The SCU continually compares address data accompanying memory access requests with what is stored in the SCU cache directories in order to determine the presence of predefined conditions indicative of data inconsistencies, and subsequently executes corresponding predefined fix-up sequences.
摘要:
In a pipelined computer system 10, memory access functions (requests) are simultaneously generated from a plurality of different locations. These multiple requests are passed through a multiplexer 50 according to a prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. In this manner, the complex task of converting virtual-to-physical addresses is accomplished for all memory access requests by a single translation buffer 30. The physical address output from the translation buffer 30 are passed to a cache 28 through a second multiplexer 40 according to a second prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. The first and second prioritization schemes differ, in that the memory is capable of handling other requests while a higher priority "miss" is pending. Thus, the prioritization scheme temporarily suspends the higher priority request while the desired data is being retrieved from main memory 14, but continues to operate on a lower priority request so that the overall operation will be enhanced if the lower priority request hits in the cache 28.