摘要:
Software development for a hybrid computing environment that includes a host computer and an accelerator, the host computer and the accelerator adapted to one another for data communications by a system level message passing module and by two or more data communications fabrics of at least two different fabric types where software development includes creating, by a programmer, a computer program for execution in the hybrid computing environment, the computer program including directives for generation of computer program code that moves contents of memory among host computers and accelerators in the hybrid computing environment; generating, by a code generator application, source code in accordance with the directives; analyzing, by the code generator application, operation of the generated code for data movement and utilization of moved data; and regenerating, by the code generator application, the source code in accordance with the directives and further in accordance with results of the analysis.
摘要:
Data processing in a hybrid computing environment that includes a host computer and an accelerator, the host and the accelerator adapted to one another for data communications by a system level message passing module and a plurality data communications fabrics of at least two different fabric types, the data processing including: monitoring data communications performance for a plurality of data communications modes; receiving, from an application program on the host computer, a request to transmit data according to a data communications mode from the host computer to the accelerator; determining, in dependence upon the monitored performance, whether to transmit the data according to the requested data communications mode; and if the data is not to be transmitted according to the requested data communications mode: selecting, in dependence upon the monitored performance, another data communications mode for transmitting the data and transmitting the data according to the selected data communications mode.
摘要:
Data processing in a hybrid computing environment that includes a host computer and an accelerator, the host and the accelerator adapted to one another for data communications by a system level message passing module and a plurality data communications fabrics of at least two different fabric types, the data processing including: monitoring data communications performance for a plurality of data communications modes; receiving, from an application program on the host computer, a request to transmit data according to a data communications mode from the host computer to the accelerator; determining, in dependence upon the monitored performance, whether to transmit the data according to the requested data communications mode; and if the data is not to be transmitted according to the requested data communications mode: selecting, in dependence upon the monitored performance, another data communications mode for transmitting the data and transmitting the data according to the selected data communications mode.
摘要:
Software development for a hybrid computing environment that includes a host computer and an accelerator, the host computer and the accelerator adapted to one another for data communications by a system level message passing module and by two or more data communications fabrics of at least two different fabric types where software development includes creating, by a programmer, a computer program for execution in the hybrid computing environment, the computer program including directives for generation of computer program code that moves contents of memory among host computers and accelerators in the hybrid computing environment; generating, by a code generator application, source code in accordance with the directives; analyzing, by the code generator application, operation of the generated code for data movement and utilization of moved data; and regenerating, by the code generator application, the source code in accordance with the directives and further in accordance with results of the analysis.
摘要:
Methods, apparatuses, and computer program products for processing unexpected messages at a compute node of a parallel computer are provided. Embodiments include receiving, by the compute node, a portion of a message from another compute node of the parallel computer, the message comprising a plurality of separate portions; in response to receiving the portion of the message, determining, by the compute node, whether one of the applications executing on the compute node, has indicated that the message is expected; if one of the applications executing on the compute node has not indicated that the message is expected, storing, by the compute node, the portion of the message in an unexpected message buffer within the compute node; and if one of the applications executing on the compute node has indicated that the message is expected, storing the portion of the message at a storage destination indicated by the message.
摘要:
Methods, systems, and computer program products for configurable alert delivery in a distributed processing system are provided. Embodiments include for each alert generated by an incident analyzer, applying active alert filters to the alert; wherein applying the active alert filters to the alert includes: creating a list of all active alert filters and a set of all active listeners; and for each active alert filter, running the active alert filter; if the active alert filter indicates that the alert should not go to one or more of the active listeners, removing the one or more active listeners from the set of all active listeners; if the active listeners set is empty, stopping processing of the alert; and if the active listeners set is not empty, selecting the next active alert filter from the active alert filter list.
摘要:
Methods, apparatuses, and computer program products for selected alert delivery in a distributed processing system are provided. Embodiments include receiving, by an incident analyzer, one or more events from one or more resources, each event identifying a location of the resource producing the event; creating, by the incident analyzer, potential alerts in dependence upon a location of the resource producing the event and location scoping rules; selecting for consolidation, by the incident analyzer, one or more of the potential alerts based on consolidation rules; and creating, by the incident analyzer, a consolidated alert based on the consolidation rules and the selected one or more potential alerts.
摘要:
Administering incident pools including receiving, by an incident analyzer from an incident queue, a plurality of incidents from one or more components of the distributed processing system; assigning, by the incident analyzer, each received incident to a pool of incidents; assigning, by the incident analyzer, to each incident a particular combined minimum time for inclusion in one or more pools, each particular combined minimum time corresponding to a particular incident; in response to the pool closing, determining, by the incident analyzer, for each incident in the pool whether the incident has met its combined minimum time for inclusion in one or more pools; and if the incident has been in the pool for its combined minimum time, including, by the incident analyzer, the incident in the closed pool; and if the incident has not been in the pool for its combined minimum time, including the incident in a next pool.
摘要:
Initiating a collective operation in a parallel computer that includes compute nodes coupled for data communications and organized in an operational group for collective operations with one compute node assigned as a root node, including: identifying, by a non-root compute node, a collective operation to execute in the operational group of compute nodes; initiating, by the non-root compute node, execution of the collective operation amongst the compute nodes of the operational group including: sending, by the non-root compute node to one or more of the other compute nodes in the operational group, an active message, the active message including information configured to initiate execution of the collective operation amongst the compute nodes of the operational group; and executing, by the compute nodes of the operational group, the collective operation.
摘要:
Methods, systems, and computer program products for event management in a distributed processing system are provided. Embodiments include receiving, by the incident analyzer, one or more events from one or more resources, each event identifying a location of the resource producing the event; identifying, by the incident analyzer, an action in dependence upon the one or more events and the location of the one or more resources producing the one or more events; identifying, by the incident analyzer, a location scope for the action in dependence upon the one or more events; and executing, by the incident analyzer, the identified action.