摘要:
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要:
Methods and integrated circuits for reconfiguration in a multi-core processor system with configurable isolation are described. According to one embodiment, a processor configuration method includes determining that a first module is faulty. A second module is configured to communicate with the first module when the first module is not faulty. The method also includes analyzing a third module with respect to a substitution criterion, selecting the third module based on the analyzing determining that the third module satisfies the substitution criterion, and subsequent to the selecting, configuring the second module to communicate with the third module instead of the first module. Additional embodiments are described in the disclosure.
摘要:
A multistage interconnect network (MIN) capable of supporting massive parallel processing, including point-to-point and multicast communications between processor modules (PMs) which are connected to the input and output ports of the network. The network is built using interconnected switch nodes arranged in 2 [logb N] stages, wherein b is the number of switch node input/output ports, N is the number of network input/output ports and [logb N] indicates a ceiling function providing the smallest integer not less than logb N. The additional stages provide additional paths between network input ports and network output ports, thereby enhancing fault tolerance and lessening contention.
摘要:
A method and a computing system for performing the method. Microstates of components of a computing system are organized into macrostates of the computing system. Each microstate represents a state that a component of the computing system is able to individually enter. Each macrostate represents a state that the computing system is able to enter as a whole. The macrostates of the computing system are organized into meta-dynamic states of the computing system. The computing system is monitored such that perturbations of the computing system are detected, wherein a perturbation of the computing system will result in movement thereof to a new meta-dynamic state. It is determined that the new meta-dynamic state is undesirable. A path is determined. The path causes the computing system to move back to a desirable meta-dynamic state. The computing system is caused to move on the path to the desirable meta-dynamic state.
摘要:
A storage system which includes a first storage device, and a storage control device connected to a higher level device and the first storage device. The storage control device controls reading or writing of data from or to the higher level device to or from the storage control device. The storage control device can be connected to another storage control device, and can acquire device information of the other storage control device.
摘要:
A storage system which includes a first storage device, and a storage control device connected to a higher level device and the first storage device. The storage control device controls reading or writing of data from or to the higher level device to or from the storage control devices. The storage control device can be connected to another storage control device, and can acquire device information of the other storage control device.
摘要:
A method and apparatus for dynamically reconfiguring a computing system are disclosed. The method comprises detecting a predetermined condition triggering a reconfiguration of the computing system; and dynamically reconfiguring a signal path affected by the condition from a first mode to a second mode responsive to detecting the condition. The apparatus is a computing system, comprising: a plurality of I/O switches, a crossbar switch, a plurality of signal paths; and a system controller. Each signal path is defined by an I/O switch and the crossbar switch. The system controller is capable of detecting a predetermined condition triggering a reconfiguration and dynamically reconfiguring at least one of the signal paths affected by the condition from a first mode to a second mode.
摘要:
A process and system for flushing high-speed buffers in a serial link used between a mover circuit (4) that executes data move operations and at least two memories through at least two channels (400, 401), the data move operations each being constituted by a move request followed by the return of a response or acknowledgement of the request, cyclically with interlacing, the responses following the same pair of serial channels (400, 401) as the requests for which they constitute the acknowledgements. The process comprises:a step for placing the mover circuit (4) into a so-called "absorption" mode of operation,a step for generating a specific write request and a specific read request, each of which comprises a so-called "barrier" marker contained in a control character preceding or following the request,a step for accumulating the responses received, anda step for comparing the responses received.
摘要:
In an apparatus having a network including successive stages of cross-point switches which collectively interconnect a plurality of nodes external to said network, wherein at least one message is carried between one of the nodes and one of the cross-point switches over a route through said network, a method for preventing routing deadlocks from occurring in the network which comprises the steps of: creating a graphical representation of the network; searching for the existence of cycles within the graphical representation; partitioning the graphical representation into at a first subgraph and a second subgraph if cycles exist in the graphical representation; searching for the existence of edges directed from the first subgraph to the second subgraph; and removing the edges directed from the first subgraph to the second subgraph. Preferably the step of partitioning the network into at a first subgraph and a second subgraph is performed such that the first subgraph and the second subgraph have an equal number of vertices, a number of directed edges from the first subgraph to the second subgraph is minimized so as to minimize the number of routes prohibited, and a set of partition constraints are satisfied. The method is recursively applied to the first subgraph and then the second subgraph, thereby removing all of the deadlock prone cycles in the network while minimizing the number of routes prohibited due to remove edges.
摘要:
A multi-processing device includes three or more processing systems, each having a processor and a corresponding main memory connected to each other by means of an individual memory bus. The multi-processing device also includes a common memory bus connectable to all the processors and all the main memories of the respective systems, an asynchronism detection circuit connected to the respective processors to produce an asynchronism detection signal indicating which system or systems are in asynchronous state, and a device control circuit responsive to the asynchronism detection signal to send a common memory bus select signal to the main memory of each failed system to change its bus connection from the individual memory bus to the common memory bus. The device control circuit also generates a master designation signal for allowing an arbitrary processor of the normal non-faulty systems to be designated as a master processor, and a copy request signal to the respective processors. The copy request signal causes the master processor to copy the content of the main memory of the normal system to the main memory of each failed system. When the synchronization between the respective systems is established, the device control circuit outputs a restart request signal to the respective processors, thus initiating the execution from a fixed, stored address in a control memory of each processor to enable synchronous starting of all of the processor. The multi-processing device further includes a communication control circuit connected to the common memory bus, thus permitting parallel loading of an initial program to the main memories of the respective systems for achieving recovery in the case where all the systems are asynchronous with each other.