摘要:
Synchronizing access to resources in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, where synchronizing access to resources includes providing in a registry, to processes executing on the accelerators and the host computer, a key associated with a resource, the key having a value; attempting, by a process, to access the resource including determining whether a current value of the key represents an unlocked state for the resource; if the current value represents an unlocked state, attempting to lock access to the resource including setting the value to a unique identification of the process; determining whether the current value is the unique identification of the process; if the current value is the unique identification accessing the resource by the process.
摘要:
Direct memory access (‘DMA’) in a hybrid computing environment that includes a host computer, an accelerator, the host computer and the accelerator adapted to one another for data communications by a system level message passing module, where DMA includes identifying, by the system level message passing module, a buffer of data to be transferred from the host computer to the accelerator according to a DMA protocol; segmenting, by the system level message passing module, the buffer of data into a predefined number of memory segments; pinning, by the system level message passing module, the memory segments against paging; and asynchronously with respect to pinning the memory segments, effecting, by the system level message passing module, DMA transfers of the pinned memory segments from the host computer to the accelerator.
摘要:
Developing collective operations for a parallel computer that includes compute nodes includes: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer; receiving, by the collective development tool from the collective developer through the GUI, a selection of one or more collective primitives; receiving, by the collective development tool from the collective developer through the GUI, a specification of a serial order of the collective primitives and a specification of input and output buffers for each collective primitive; and generating, by the collective development tool in dependence upon the selection of collective primitives, the serial order of the collective primitives, and the input and output buffers for each collective primitive, executable code that carries out the collective operation specified by the collective primitives.
摘要:
Developing a collective operation for execution in a parallel computer that includes compute nodes coupled for data communications, including: receiving, by a collective development tool, a specification of a target collective operation to develop; receiving, by the collective development tool, a specification of computer hardware characteristics of the parallel computer within which the target collective operation will be executed; selecting, by the collective development tool automatically without user interaction, iteratively for each stage of the target collective operation, a collective primitive in dependence upon the specification of computer hardware characteristics and a predefined set of rules specifying selection criteria of collective primitives based on computer hardware characteristics; and generating, by the collective development tool, the target collective operation in dependence upon the selected collective primitives.
摘要:
Methods, apparatuses, and computer program products for prioritized alert delivery in a distributed processing system are provided. Embodiments include receiving a plurality of events from a plurality of tiered event producing components in a distributed computing system; identifying a plurality of potential alerts in dependence upon the events; wherein each potential alert includes a priority and one or more condition events describing the event producing component creating the event; comparing the potential alerts and their condition events and priorities; identifying the highest priority potential alert having condition events whose event producing component is in a tier that is higher than the tier of condition events of one or more lower priority potential alerts; and creating an alert in dependence upon the identified highest priority potential alert and the condition events corresponding to the identified highest priority potential alert.
摘要:
Administering incident pools including receiving, by an incident analyzer from an incident queue, a plurality of incidents from one or more components of the distributed processing system; assigning, by the incident analyzer, each received incident to a pool of incidents; assigning, by the incident analyzer, to each incident a particular combined minimum time for inclusion in one or more pools, each particular combined minimum time corresponding to a particular incident; in response to the pool closing, determining, by the incident analyzer, for each incident in the pool whether the incident has met its combined minimum time for inclusion in one or more pools; and if the incident has been in the pool for its combined minimum time, including, by the incident analyzer, the incident in the closed pool; and if the incident has not been in the pool for its combined minimum time, including the incident in a next pool.
摘要:
Methods, systems, and computer program products for administering event pools for relevant event analysis are provided. Embodiments include assigning, by an incident analyzer, a plurality of events to an events pool; determining, by the incident analyzer, an event suppression duration; determining, by the incident analyzer in dependence upon event analysis rules, to suppress events having particular attributes indicating the events occurred during the event suppression duration; and suppressing, by the incident analyzer, each event assigned to the events pool having the particular attributes indicating the events occurred during the event suppression duration.
摘要:
Synchronizing access to resources in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, where synchronizing access to resources includes providing in a registry, to processes executing on the accelerators and the host computer, a key associated with a resource, the key having a value; attempting, by a process, to access the resource including determining whether a current value of the key represents an unlocked state for the resource; if the current value represents an unlocked state, attempting to lock access to the resource including setting the value to a unique identification of the process; determining whether the current value is the unique identification of the process; if the current value is the unique identification accessing the resource by the process.
摘要:
Methods, systems, and computer program products for event management in a distributed processing system are provided. Embodiments include receiving, by the incident analyzer, one or more events from one or more resources, each event identifying a location of the resource producing the event; identifying, by the incident analyzer, an action in dependence upon the one or more events and the location of the one or more resources producing the one or more events; identifying, by the incident analyzer, a location scope for the action in dependence upon the one or more events; and executing, by the incident analyzer, the identified action.
摘要:
Administering incident pools including assigning an incident received from one or more components of the distributed processing system to a pool of incidents; assigning to each incident a particular combined minimum time for inclusion of the incident in the pool; in response to the pool closing, determining for each incident in the pool whether the incident has met its combined minimum time for inclusion in the pool; if the incident has been in the pool for its combined minimum time, including the incident in the closed pool; if the incident has not been in the pool for its combined minimum time, moving the incident from the closed pool to a next pool; applying incident suppression rules using the incidents assigned to the next pool; and applying incident creation rules to the incidents that were assigned to the next pool, while omitting any duplicate incidents caused by the assignment.