摘要:
A plurality of processors in a data processing system share a common memory through which they communicate and share resources. When sharing resources, one processor needs to wait for another processor to modify a specified location in memory, such as unlocking a lock. Memory and bus traffic are minimized during this waiting by first reading and testing the memory location. Then, the memory location is not read and tested again until the local copy of the cache line containing that memory location is invalidated by another processor. This feature is utilized both for a Lock instruction and a Wait for Change instruction, both of which utilize a timer parameter for specifying a maximum number of cycles to wait for another processor to modify the specified location in memory.
摘要:
In a NUMA architecture, processors in the same CPU module with a processor opening a spin gate tend to have preferential access to a spin gate in memory when attempting to close the spin gate. This “unfair” memory access to the desired spin gate can result in starvation of processors from other CPU modules. This problem is solved by “balking” or delaying a specified period of time before attempting to close a spin gate whenever either one of the processors in the same CPU module just opened the desired spin gate, or when a processor in another CPU module is spinning trying to close the spin gate. Each processor detects when it is spinning on a spin gate. It then transmits that information to the processors in other CPU modules, allowing them to balk when opening spin gates.
摘要:
An input/output processing system includes a plurality of active modules, a plurality of passive modules, at least one memory module and a system interface unit having a plurality of ports, each of which connect to a different one of the modules. The active modules include an input/output processing unit which processes interrupts and executes command sequences and a multiplexer unit which directly controls transfers between the memory module and any one of the peripheral devices coupled to different ones of a plurality of ports of the multiplexer unit. The system interface unit which operatively provides connections between the different modules includes apparatus for generating steering codes defining the physical location of each module requiring service by another module of the system. The system interface unit appends information provided by the particular module generating a requesting request for attention to the steering code generated. The generation of steering code information by the system interface unit and the module included in such requests insures that only authorized accesses are made to the different modules during the input/output processing unit's execution of programs during the running of processes associated therewith.
摘要:
In a NUMA architecture, processors in the same CPU module with a processor opening a spin gate tend to have preferential access to a spin gate in memory when attempting to close the spin gate. This “unfair” memory access to the desired spin gate can result in starvation of processors from other CPU modules. This problem is solved by “balking” or delaying a specified period of time before attempting to close a spin gate whenever either one of the processors in the same CPU module just opened the desired spin gate, or when a processor in another CPU module is spinning trying to close the spin gate. Each processor detects when it is spinning on a spin gate. It then transmits that information to the processors in other CPU modules, allowing them to balk when opening spin gates.
摘要:
Two instructions are provided to synchronize multiple processors (92) in a data processing system (80). A Transmit Sync instruction (TSYNC) transmits a synchronize processor interrupt (276) to all of the active processors (92) in the system (80). Processors (92) wait for receipt of the synchronize signal (278) by executing a Wait for Sync (WSYNC) instruction. Each of the processors waiting for such a signal (278) is activated at the next clock cycle after receipt of the interrupt signal (278). An optional timeout value is provided to protect against hanging a waiting processor (92) that misses the interrupt (278). Whenever the WSYNC instruction is activated by receipt of the interrupt (278), a trace is started to trace a fixed number of events to an internal Trace Cache (58).
摘要:
A distrbutor for the central execution pipeline unit of a central processor of a data processing system, which central processor has a plurality of execution units. The distributor serves as a communications center by which machine words, such as operands, are transmitted primarily from the cache unit of the central processor unit to execution units and the instruction fetch unit of the central processor unit. Some machine words are transmitted directly from the collector unit to selected units and others are transmitted after being stored in the data register of the distributor. Machine words stored in the data register can be realigned if required by an instruction by character or word alignment switches. The aligned words are then stored in the data register means prior to their being transmitted to units of the central processor. Other sources of signals transmitted by the distributor are the collector unit and registers of the distributor containing the effective address of a target word as calculated by the central execution pipeline unit, as well as copies of machine words in key registers, the A/Q registers, of certain of the execution units.
摘要:
A data processing system comprises a data processing unit coupled to a cache unit which couples to a main store. The cache unit includes a cache store organized into a plurality of levels, each for storing a number of blocks of information in the form of data and instructions. Directories associated with the cache store contain addresses and level control information for indicating which blocks of information reside in the cache store. The cache unit further includes control apparatus and a transit block buffer comprising a number of sections each having a plurality of locations for storing read commands and transit block addresses associated therewith. A corresponding number of valid bit storage elements are included, each of which is set to a binary ONE state when a read command and the associated transit block address are loaded into a corresponding one of the buffer locations. Comparison circuits, coupled to the transit block buffer, compare the transit block address of each outstanding read command stored in the transit block buffer section with the address of each read command or write command received from the processing unit. When there is a conflict, the comparison circuits generate an output signal which conditions the control apparatus to hold or stop further processing of the command by the cache unit and the operation of the processing unit. Holding lasts until the valid bit storage element of the location storing the outstanding read command is reset to a binary ZERO indicating that execution of the read command is completed.
摘要:
A processor (40) in a data processing system simultaneously loads multiple registers (60) with a single value for fast domain switching. A domain switch instruction asserts a register block write signal (112) along with the register write signal (116) when block writing the single value to the set of registers (60). Register address lines (110, 111) are decoded in two sets: a first set of decoded address lines (110) specifying a block of registers; and the second set (111) specifying one register in the block of registers. When the register block write signal (112) is asserted during a register write, the second set of decoded address lines (111) are ignored, and all registers in the block of registers (60) selected by the first set of decoded address lines (110) are simultaneously loaded with a common value. Additional drive requirements are solved either by adding a buffer (226) to each register bit, or by disabling (228) the feedback path (215) in each register bit during block writes.
摘要:
A computer system including a group of CPUs, each having a private cache which communicates with its CPU to receive requests for information blocks and for servicing such requests includes a CPU bus coupled to all the private caches and to a shared cache. Each private cache includes a cache memory and a cache controller having: a processor directory for identifying information blocks resident in the cache memory, logic for identifying cache misses on requests from the CPU, a cache miss output buffer for storing the identifications of a missed block and a block to be moved out of cache memory to make room for the requested block and for selectively sending the identifications onto the CPU bus, a cache miss input buffer stack for storing the identifications of all recently missed blocks and blocks to be swapped from all the CPUs in the group, a comparator for comparing the identifications in the cache miss output buffer stack with the identifications in the cache miss input buffer stack and control logic, responsive to the first comparator sensing a compare (indicating a request by another CPU for the block being swapped), for inhibiting the broadcast of the swap requirement onto the CPU bus and converting the swap operation to a "siphon" operation to service the request of the other CPU.
摘要:
A fault tolerant computer system includes at least two central processing units each having a cache memory and a parity error detector adapted to sense parity errors in blocks of information read from and write to cache and to issue a cache parity read or write error flag if a parity error is sensed. A system bus couples the CPU to a System Control Unit having a parity error correction facility, and a memory bus couples the SCU to a main memory. An error recovery control feature distributed across the CPU, including a Service Processor and the operating system software, is responsive to the sensing of a read parity error flag in a sending CPU and a write parity error flag in a receiving CPU in conjunction with a siphon operation for transferring the faulting block from the sending CPU to main memory via the SCU (in which given faulting block is corrected) and for subsequently transferring the corrected memory block from main memory to the receiving CPU when a retry is instituted.