Abstract:
A virtual machine (VM) runs on system hardware, which includes a physical network interface device that enables transfer of packets between the VM and a destination over a network. A virtual machine monitor (VMM) exports a hardware interface to the VM and runs on a kernel, which forms a system software layer between the VMM and the system hardware. Pending packets (both transmit and receive) issued by the VM are stored in a memory region that is shared by, that is, addressable by, the VM, the VMM, and the kernel. Rather than always transferring each packet as it is issued, packets are clustered in the shared memory region until a trigger event occurs, whereupon the cluster of packets is passed as a group to the physical network interface device. Optional mechanisms are included to prevent packets from waiting too long in the shared memory space before being transferred to the network. An interrupt offloading mechanism is also disclosed for use in multiprocessor systems such that it is in most cases unnecessary to interrupt the VM in order to request a VMM action, and the need for VMM-to-kernel context transitions is reduced.
Abstract:
To generate a checkpoint for a virtual machine (VM), first, while the VM is still running, a copy-on-write (COW) disk file is created pointing to a parent disk file that the VM is using. Next, the VM is stopped, the VM's memory is marked COW, the device state of the VM is saved to memory, the VM is switched to use the COW disk file, and the VM begins running again for substantially the remainder of the checkpoint generation. Next, the device state that was stored in memory and the unmodified VM memory pages are saved to a checkpoint file. Also, a copy may be made of the parent disk file for retention as part of the checkpoint, or the original parent disk file may be retained as part of the checkpoint. If a copy of the parent disk file was made, then the COW disk file may be committed to the original parent disk file.
Abstract:
A virtualization platform provides fault tolerance for a primary virtual machine by continuously transmitting checkpoint information of the primary virtual machine to a collector process, such as a backup virtual machine. When implemented on a hardware platform comprising a multi-processor that supports nested page tables, the virtualization platform leverages the nested page table support to quickly identify memory pages that have been modified between checkpoints. The backup virtual machine provides feedback information to assist the virtualization platform in identifying candidate memory pages for transmitting actual modifications to the memory pages rather than the entire memory page as part of the checkpoint information. The virtualization platform further maintains a modification history data structure to identify memory pages that can be transmitted simultaneous with the execution of the primary virtual machine rather than while the primary virtual machine has been stunned.
Abstract:
A source virtual machine (VM) hosted on a source server is migrated to a destination VM on a destination server without first powering down the source VM. After optional pre-copying of the source VM's memory to the destination VM, the source VM is suspended and its non-memory state is transferred to the destination VM; the destination VM is then resumed from the transferred state. The source VM memory is either paged into the destination VM on demand, or is transferred asynchronously by pre-copying and write-protecting the source VM memory, and then later transferring only the modified pages after the destination VM is resumed. The source and destination servers preferably share common storage, in which the source VM's virtual disk is stored; this avoids the need to transfer the virtual disk contents.
Abstract:
To generate a checkpoint for a virtual machine (VM), first, while the VM is still running, a copy-on-write (COW) disk file is created pointing to a parent disk file that the VM is using. Next, the VM is stopped, the VM's memory is marked COW, the device state of the VM is saved to memory, the VM is switched to use the COW disk file, and the VM begins running again for substantially the remainder of the checkpoint generation. Next, the device state that was stored in memory and the unmodified VM memory pages are saved to a checkpoint file. Also, a copy may be made of the parent disk file for retention as part of the checkpoint, or the original parent disk file may be retained as part of the checkpoint. If a copy of the parent disk file was made, then the COW disk file may be committed to the original parent disk file.
Abstract:
An elastic filesystem for temporary data provides storage space for virtual machines (VMs) in a distributed computing system. The filesystem redirects accesses to virtual disks in VMs to a common pool file. The system provides performance and storage efficiency at least on par with local, direct attached virtual disks, while providing a single pool of shared storage that is provisioned and managed independently of the VMs. The system provides storage isolation between VMs storing temporary data in that shared pool. Also, storage space for temporary data may be allocated on demand and reclaimed when no longer needed, thereby supporting a wide variety of temporary space requirements for different Hadoop jobs.
Abstract:
A source virtual machine (VM) hosted on a source server is migrated to a destination VM on a destination server without first powering down the source VM. After optional pre-copying of the source VM's memory to the destination VM, the source VM is suspended and its non-memory state is transferred to the destination VM; the destination VM is then resumed from the transferred state. In one embodiment, the source VM memory is either paged into the destination VM on demand, or is transferred asynchronously by pre-copying and write-protecting the source VM memory, and then later transferring only the modified pages after the destination VM is resumed. In one embodiment, the source and destination servers share common storage, in which the source VM's virtual disk is stored; this avoids the need to transfer the virtual disk contents.
Abstract:
A source virtual machine (VM) hosted on a source server is migrated to a destination VM on a destination server without first powering down the source VM. After optional pre-copying of the source VM's memory to the destination VM, the source VM is suspended and its non-memory state is transferred to the destination VM; the destination VM is then resumed from the transferred state. In one embodiment, the source VM memory is either paged into the destination VM on demand, or is transferred asynchronously by pre-copying and write-protecting the source VM memory, and then later transferring only the modified pages after the destination VM is resumed. In one embodiment, the source and destination servers share common storage, in which the source VM's virtual disk is stored; this avoids the need to transfer the virtual disk contents.
Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Data and compute nodes are separated into different virtual machines (VM). Compute VMs are used to launch containers from different tenants. Compute VMs are organized in pools of hot spare VMs that are immediately available for launching a container and executing a task, and pools of cold spare VMs. Each compute VM may include a mounted network filesystem provided by a node manager to share intermediate outputs across VMs executing on the same host.
Abstract:
Techniques for creating a fault tolerant system in a virtual machine environment utilize a primary VM and a backup VM. To initialize the fault tolerant system, the backup VM and primary VM start from the same state. To achieve this in one embodiment, the primary VM is suspended and the state of the primary VM is copied to the backup VM. Once the backup VM has received all the primary VM's state, the primary VM is resumed. Subsequent state changes of the primary VM are buffered until the backup VM resumes, connects to the primary VM, and starts consuming the buffered content. Thereafter, synchronization is maintained by the primary VM's writing relevant state changes to a log and the backup VM's reading such relevant state changes from the log.