Abstract:
A distributed information processing system comprises first and second sites, which may comprise respective production and replica sites. A snapshot of a first portion of a complex asset is generated at the first site and sent to the second site, and a second portion of the complex asset is replicated at the second site. The complex asset includes one or more virtual machines provided by one or more hypervisors of a virtualization platform of the first site and at least one storage element surfaced through a storage platform of the first site, with the storage platform being external to the virtualization platform. Recovery of the complex asset is implemented at the second site utilizing, for example, a ghost complex asset preconfigured in accordance with current complex asset state information based on the snapshot of the first portion of the complex asset and the replicated second portion of the complex asset.
Abstract:
Improved multi-tier storage techniques are provided for storing data, such as checkpoints or other bursty data, in parallel computing environments. A burst buffer appliance is provided for use in a first storage tier of a multi-tier storage system comprising at least the first storage tier and a second storage tier. The exemplary burst buffer appliance comprises a memory for storing data; and at least one processing device to transform at least a portion of the data for storage on the second storage tier based on one or more performance characteristics of the second storage tier. In at least one embodiment, the at least one processing device is further configured to perform at least one function on the at least the portion of the data on behalf of the second storage tier. The performance characteristics of the second storage tier comprise, for example, a stripe size and/or network topology information.
Abstract:
Example embodiments of the present invention relate to a method and system for immediate recovery of replicated virtual machines. The method includes replicating a complex asset from a first site of a distributed information processing system to a second site of the distributed information processing system. The replicated complex asset the may be configured at a first time in an active operational state but in a disconnected communicative state at the second site of the distributed information processing system. At a second time, the replicated complex asset may be configured in a connected communicative state at the second site of the distributed information processing system to facilitate recovery at the second site from a failure in the complex asset at the first site.
Abstract:
A distributed information processing system comprises first and second sites, which may comprise respective production and replica sites. A snapshot of a first portion of a complex asset is generated at the first site and sent to the second site, and a second portion of the complex asset is replicated at the second site. The complex asset includes one or more virtual machines provided by one or more hypervisors of a virtualization platform of the first site and at least one storage element surfaced through a storage platform of the first site, with the storage platform being external to the virtualization platform. Recovery of the complex asset is implemented at the second site utilizing, for example, a ghost complex asset preconfigured in accordance with current complex asset state information based on the snapshot of the first portion of the complex asset and the replicated second portion of the complex asset.
Abstract:
An information processing system in an illustrative embodiment comprises a sync point coordinator providing resilient high throughput job processing via coordinated resource scheduling across a distributed virtual infrastructure. In one aspect, a processing device of the information processing system comprises a processor coupled to a memory. The processing device implements a controller configured to coordinate interaction of each of multiple sync point components of the information processing system with distributed virtual infrastructure of the information processing system. The controller is coupled between each of the sync point components and the distributed virtual infrastructure. The controller may comprise, for example, a sync point coordinator having a schedule optimization module, and the sync point components may include, for example, a throughput scheduler, a resource manager, a job management system and a snapshot management system.