摘要:
A processor (92) contains a Trace RAM (210) for tracing internal processor signals and operands. A first trace mode separately traces microcode instruction execution and cache controller execution. Selectable groups of signals are traced from both the cache controller (256) and the arithmetic (AX) processor (260). A second trace mode selectively traces full operand words that result from microcode instruction (242). Each microcode instruction word (242) has a trace enable bit (244) that when enabled causes the results of that microcode instruction (242) to be recorded in the Trace RAM (210).
摘要:
Two instructions are provided to synchronize multiple processors (92) in a data processing system (80). A Transmit Sync instruction (TSYNC) transmits a synchronize processor interrupt (276) to all of the active processors (92) in the system (80). Processors (92) wait for receipt of the synchronize signal (278) by executing a Wait for Sync (WSYNC) instruction. Each of the processors waiting for such a signal (278) is activated at the next clock cycle after receipt of the interrupt signal (278). An optional timeout value is provided to protect against hanging a waiting processor (92) that misses the interrupt (278). Whenever the WSYNC instruction is activated by receipt of the interrupt (278), a trace is started to trace a fixed number of events to an internal Trace Cache (58).
摘要:
A data processing system comprises a data processing unit coupled to a cache unit which couples to a main store. The cache unit includes a cache store organized into a plurality of levels, each for storing a number of blocks of information in the form of data and instructions. Directories associated with the cache store contain addresses and level control information for indicating which blocks of information reside in the cache store. The cache unit further includes control apparatus and a transit block buffer comprising a number of sections each having a plurality of locations for storing read commands and transit block addresses associated therewith. A corresponding number of valid bit storage elements are included, each of which is set to a binary ONE state when a read command and the associated transit block address are loaded into a corresponding one of the buffer locations. Comparison circuits, coupled to the transit block buffer, compare the transit block address of each outstanding read command stored in the transit block buffer section with the address of each read command or write command received from the processing unit. When there is a conflict, the comparison circuits generate an output signal which conditions the control apparatus to hold or stop further processing of the command by the cache unit and the operation of the processing unit. Holding lasts until the valid bit storage element of the location storing the outstanding read command is reset to a binary ZERO indicating that execution of the read command is completed.
摘要:
In a multiprocessor write-into-cache data processing system including: a memory; at least first and second shared caches; a system bus coupling the memory and the shared caches; at least one processor having a private cache coupled, respectively, to each shared cache; method and apparatus for preventing hogging of ownership of a gateword stored in the memory which governs access to common code/data shared by processes running in the processors by which a read copy of the gateword is obtained by a given processor by performing successive swap operations between the memory and the given processor's shared cache, and the given processor's shared cache and private cache. If the gateword is found to be OPEN, it is CLOSEd by the given processor, and successive swap operations are performed between the given processor's private cache and shared cache and shared cache and memory to write the gateword CLOSEd in memory such that the given processor obtains exclusive access to the governed common code/data. When the given processor completes use of the common code/data, it writes the gateword OPEN in its private cache, and successive swap operations are performed between the given processor's private cache and shared cache and shared cache and memory to write the gateword OPEN in memory.
摘要:
Cache memory, and thus computer system, reliability is increased by duplicating cache tag entries. Each cache tag has a primary entry and a duplicate entry. Then, when cache tags are associatively searched, both the primary and the duplicate entry are compared to the search value. At the same time, they are also parity checked and compared against each other. If a match is made on either the primary entry or the duplicate entry, and that entry does not have a parity error, a cache “hit” is indicated. All single bit cache tag parity errors are detected and compensated for. Almost all multiple bit cache tag parity errors are detected.
摘要:
Interactions among multiple processors (92) are exhaustively tested. A master processor (92) retrieves test information for a set of tests from a test table (148). It then enters a series of embedded loops, with one loop for each of the tested processors (92). A cycle delay count for each of the tested processors (92) is incremented (152, 162, 172) through a range specified in the test table entry. For each combination of cycle delay count loop indices, a single test is executed (176). In each such test (176), the master processor (92) sets up (182) each of the other processors (92) being tested. This setup (182) specifies the delay count and the code for that processor (92) to execute. When each processor (92) is setup (182), it waits (192) for a synchronize interrupt (278). When all processors (92) have been setup (182), the master processor (92) issues (191) the synchronize interrupt signal (276). Each processor (92) then starts traces (193) and delays (194) the specified number of cycles. After the delay, the processor (92) executes its test code (195).
摘要:
A processor (92) in a data processing system (80) provides a DELAY instruction. Executing the DELAY instruction causes the processor (92) to a specified integral number of clock (98) cycles before continuing. Delays are guaranteed to have a linear relationship with a constant slope with the specified number of clock cycles. Incrementing the specified delay through a range allows exhaustive testing of interactions among multiple processors.
摘要:
Apparatus and method for providing an improved instruction buffer associated with a cache memory unit. The instruction buffer is utilized to transmit to the control unit of the central processing unit a requested sequence of data groups. In the current invention, the instruction buffer can store two sequences of data groups. The instruction buffer can store the data group sequence for the procedure currently in execution by the data processing unit and can simultaneously store data groups to which transfer, either conditional or unconditional, has been identified in the sequence currently being executed. In addition, the instruction buffer provides signals for use by the central processing unit defining the status of the instruction buffer.
摘要:
A data processing system includes a cache store to provide an interface with a main storage unit for a central processing unit. The central processing unit includes a microprogram control unit in addition to control circuits for establishing the sequencing of the processing unit during the execution of program instructions. Both the microprogram control unit and control circuits include means for generating pre-read commands to the cache store in conjunction with normal processing operations during the processing of certain types of instructions. In response to pre-read commands, the cache store, during predetermined points of the processing of each such instruction, fetches information which is required by such instruction at a later point in the processing thereof.
摘要:
In a NUMA architecture, processors in the same CPU module with a processor opening a spin gate tend to have preferential access to a spin gate in memory when attempting to close the spin gate. This “unfair” memory access to the desired spin gate can result in starvation of processors from other CPU modules. This problem is solved by “balking” or delaying a specified period of time before attempting to close a spin gate whenever either one of the processors in the same CPU module just opened the desired spin gate, or when a processor in another CPU module is spinning trying to close the spin gate. Each processor detects when it is spinning on a spin gate. It then transmits that information to the processors in other CPU modules, allowing them to balk when opening spin gates.