摘要:
A parallel processing computer is disclosed in which a plurality of memory elements (e.g., caches) are accessable by a plurality of processors, and in which a fixed access priority for the processors is varied periodically to reduce differences in processing times between the processors in applications where memory access conflicts occur. The variation in priority is done infrequently enough so as not to disturb the ability of the system to avoid memory access conflicts by falling into a "lockstep" condition, in which the fixed priority combined with a selected interleaving of the memory elements produces a memory access pattern that, for certain memory strides, produces no memory access conflicts.
摘要:
A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel. Current accesses to the cache are handled by current-access-completion circuitry which determines whether the current access is capable of immediate completion and either completes the access immediately if so capable or transfers the access to pending-access-completion circuitry if not so capable. The latter circuitry works on completion of pending accesses; it determines and stores for each pending access status information prescribing the steps required to complete the access and redetermines that status information as conditions change. In working on completion of current and pending accesses, the addresses of the accesses are compared to those of memory accesses in progress on the system.
摘要:
A digital computer including a plurality of memory elements, the memory elements being interleaved (i.e., each is assigned memory addresses on the basis of a low order portion of the memory address), a plurality of processors connected in parallel, the processors each having means for initiating an access of data from any of the memory elements simultaneously with accesses of other processors, the memory elements each being capable of accepting an access from just one of the processors during a given cycle, and the memory elements being interleaved so that the memory access patterns generated at a stride of one and a stride of two each meet the conditions that (1) the pattern will tolerate being offset with respect to an identical pattern by a desired offset and any multiple of the offset (wherein tolerating means that no memory access conflicts arise, i.e., more than one processor simultaneously attempting to access the same memory element) and (2) the pattern includes sufficient conflicts at offsets other than the desired offset to force the processors to assume a relationship wherein the desired offset is achieved, so that the processor is able to access a different memory element simultaneously without creating access conflicts.
摘要:
Techniques are provided for performing a delay. A request for a delay may be received. A plurality of delay queues may be provided, with each delay queue spanning a range of delay remaining. The request may be assigned to a delay queue based on the delay remaining. The request may be moved to a different delay queue as the delay remaining decreases.
摘要:
Techniques described herein provide for sending packets to nodes based on distribution trees with stages. A packet may be received at a node. The stage of the node may be determined. A distribution tree may be selected. Based on the stage and the selected distribution tree, subsequent stage nodes may be determined. The packet may be sent to the subsequent stage nodes.
摘要:
Techniques for taking corrective action based on probabilities are provided. Request messages may include a size of a data packet and a stated issue interval. A probability of taking corrective action based on the size of the data packet, the stated issue interval, and a target issue interval may be retrieved. Corrective action may be taken with the retrieved probability.
摘要:
Techniques for delays based on packet sizes are provided. Request messages may identify the size of a data packet. Delays may be initiated based in part on a portion of the size of the data packet. The delays may also be based in part on target issue intervals. Request messages may be sent after the delays.
摘要:
Techniques described herein provide for sending request messages. The request messages may be sent in order. The request messages may be sent over a designated communications channel.
摘要:
Embodiments involve an embedded processing element that fetches at least two possible next instructions (control words) in parallel in one cycle, and executes one of them during the following cycle based on the result of a conditional branch test. Embodiments reduce or avoid branch penalties (zero penalty branches).
摘要:
Compiler-based checkpointing for error recovery. In various embodiments, a compiler is adapted to identify checkpoints in program code. Sets of data objects are associated with the checkpoints, and checkpoint code is generated by the compiler for execution at the checkpoints. The checkpoint code stores state information of the associated data objects for recovery if execution of the program is interrupted.