Abstract:
A method, system, and computer program product containing instructions for establishing and maintaining multiple connections over different communication fabrics between two processes. The slowest, most reliable connection may be established first and then complemented by progressively faster connections between the same pair of processes. Each of these multiple connections is maintained throughout the duration of the communication session between the processes. These multiple connections may include connections made via network interfaces and, when available, direct connections such as a shared memory connection or a point-to-point processor interconnection. This connection strategy provides one or more failback communication paths that can be used with no startup costs in the event of failure of one of the other communication paths. These failback communication paths can be used to exchange failover protocol information needed to resend messages that were undelivered due to failure of one of the communication connections.
Abstract:
In one embodiment, the present invention includes a method for receiving an application linked against a first application binary interface (ABI), providing an ABI wrapper associated with the application, and binding the application to a native message passing interface (MPI) library using the ABI wrapper and the profiling message passing interface (PMPI). Other embodiments are described and claimed.
Abstract:
The present disclosure provides a method for virtual processing. According to one exemplary embodiment, the method may include partitioning a plurality of cores of an integrated circuit (IC) into a plurality of virtual processors, the plurality of virtual processors having a framework dependent upon a programming application. The method may further include performing at least one task using the plurality of cores. Of course, additional embodiments, variations and modifications are possible without departing from this embodiment.
Abstract:
A method, system, and computer program product containing instructions for automatically converting an MPI source code program into an MPI thread-based program. In response to inputs in the form of an MPI source code program and a command, a converter declares a global variable of the MPI source code program as a thread private variable to create a first private variable for a first thread and a second private variable for a second thread. A library is identified to support converting processes to threads during execution of the MPI thread-based program, and the identified library is used to build an executable version of the MPI thread-based program. The identified library may include code to identify instantiation of a new process when the MPI thread-based program is executing, and in response, to cause a corresponding thread for the MPI thread-based program to be instantiated.
Abstract:
In one embodiment, the present invention includes a method to obtain topology information regarding a system including at least one multicore processor, provide the topology information to a plurality of parallel processes, generate a topological map based on the topology information, access the topological map to determine a topological relationship between a sender process and a receiver process, and select a given memory copy routine to pass a message from the sender process to the receiver process based at least in part on the topological relationship. Other embodiments are described and claimed.
Abstract:
A method or device may optimize applications on a parallel computing system. Environment variables data may be used as well as a test kernel of an application to optimize communication protocol performance according to a set of predefined tuning rules. The tuning rules may specify the output parameters to be optimized, and may include a ranking or hierarchy of such output parameters. Optimization may be achieved through use of a tuning unit, which may execute the test kernel on the parallel computing system, and may monitor the output parameters for a series of input parameters. The input parameters may be varied over a range of values and combinations. Input parameters corresponding to optimized output parameters may be stored for future use. This information may be used to adjust the application's communication protocol performance “on the fly” by changing the input parameters for a given usage scenario.
Abstract:
A message passing interface (“MPI”) cluster may be initialized and configured by reading a list of node identifiers from a file, starting a process on each node whose identifier was listed, and providing a second list of node identifiers to the process.
Abstract:
A method or device may optimize applications on a parallel computing system using protocols such as Message Passing Interface (MPI). Environment variables data may be used as well as a test kernel of an application to optimize communication protocol performance according to a set of predefined tuning rules. The tuning rules may specify the output parameters to be optimized, and may include a ranking or hierarchy of such output parameters. Optimization may be achieved through use of a tuning unit, which may execute the test kernel on the parallel computing system, and may monitor the output parameters for a series of input parameters. The input parameters may be varied over a range of values and combinations. Input parameters corresponding to optimized output parameters may stored for future use. This information may be used to adjust the application's communication protocol performance “on the fly” by changing the input parameters for a given usage scenario.
Abstract:
A processing system features random access memory (RAM) and a processor. The processor features cache memory and multiple processing cores. The processor also features cache unmapping logic that can receive an unmap request calling for creation of a memory segment to be used as a shared memory segment to reside in the cache memory of the processor. The shared memory segment may facilitate interprocess communication (IPC). After receiving the unmap request, the cache unmapping logic may cause the processing system to omit the shared memory segment when writing data from the cache memory to the RAM. Other embodiments are described and claimed.
Abstract:
Disclosed herein are systems, methods and storage medium associated with native cloud computing. In embodiments, a system may include a number of clusters of computing nodes, and a data communication network configured to couple the clusters of computing nodes. The system may further include a control node configured to segment or cause segmentation of the data communication network to isolate a cluster of the computing nodes from other clusters of the computing nodes, t for allocation for native execution of a computation task. The system may further include a control network coupled to the data communication network and the control node. Other embodiments may be disclosed and claimed.