Abstract:
An elastic filesystem for temporary data provides storage space for virtual machines (VMs) in a distributed computing system. The filesystem redirects accesses to virtual disks in VMs to a common pool file. The system provides performance and storage efficiency at least on par with local, direct attached virtual disks, while providing a single pool of shared storage that is provisioned and managed independently of the VMs. The system provides storage isolation between VMs storing temporary data in that shared pool. Also, storage space for temporary data may be allocated on demand and reclaimed when no longer needed, thereby supporting a wide variety of temporary space requirements for different Hadoop jobs.
Abstract:
A source virtual machine (VM) hosted on a source server is migrated to a destination VM on a destination server without first powering down the source VM. After optional pre-copying of the source VM's memory to the destination VM, the source VM is suspended and its non-memory state is transferred to the destination VM; the destination VM is then resumed from the transferred state. In one embodiment, the source VM memory is either paged into the destination VM on demand, or is transferred asynchronously by pre-copying and write-protecting the source VM memory, and then later transferring only the modified pages after the destination VM is resumed. In one embodiment, the source and destination servers share common storage, in which the source VM's virtual disk is stored; this avoids the need to transfer the virtual disk contents.
Abstract:
A virtual machine (VM) runs on system hardware, which includes a physical network interface device that enables transfer of packets between the VM and a destination over a network. A virtual machine monitor (VMM) exports a hardware interface to the VM and runs on a kernel, which forms a system software layer between the VMM and the system hardware. Pending packets (both transmit and receive) issued by the VM are stored in a memory region that is shared by, that is, addressable by, the VM, the VMM, and the kernel. Rather than always transferring each packet as it is issued, packets are clustered in the shared memory region until a trigger event occurs, whereupon the cluster of packets is passed as a group to the physical network interface device. Optional mechanisms are included to prevent packets from waiting too long in the shared memory space before being transferred to the network. An interrupt offloading mechanism is also disclosed for use in multiprocessor systems such that it is in most cases unnecessary to interrupt the VM in order to request a VMM action, and the need for VMM-to-kernel context transitions is reduced.
Abstract:
To generate a checkpoint for a virtual machine (VM), first, while the VM is still running, a copy-on-write (COW) disk file is created pointing to a parent disk file that the VM is using. Next, the VM is stopped, the VM's memory is marked COW, the device state of the VM is saved to memory, the VM is switched to use the COW disk file, and the VM begins running again for substantially the remainder of the checkpoint generation. Next, the device state that was stored in memory and the unmodified VM memory pages are saved to a checkpoint file. Also, a copy may be made of the parent disk file for retention as part of the checkpoint, or the original parent disk file may be retained as part of the checkpoint. If a copy of the parent disk file was made, then the COW disk file may be committed to the original parent disk file.
Abstract:
A virtualization platform provides fault tolerance for a primary virtual machine by continuously transmitting checkpoint information of the primary virtual machine to a collector process, such as a backup virtual machine. When implemented on a hardware platform comprising a multi-processor that supports nested page tables, the virtualization platform leverages the nested page table support to quickly identify memory pages that have been modified between checkpoints. The backup virtual machine provides feedback information to assist the virtualization platform in identifying candidate memory pages for transmitting actual modifications to the memory pages rather than the entire memory page as part of the checkpoint information. The virtualization platform further maintains a modification history data structure to identify memory pages that can be transmitted simultaneous with the execution of the primary virtual machine rather than while the primary virtual machine has been stunned.
Abstract:
In a virtualized computer system in which a guest operating system runs on a virtual machine of a virtualized computer system, a computer-implemented method of providing the guest operating system with direct access to a hardware device coupled to the virtualized computer system via a communication interface, the method including: (a) obtaining first configuration register information corresponding to the hardware device, the hardware device connected to the virtualized computer system via the communication interface; (b) creating a passthrough device by copying at least part of the first configuration register information to generate second configuration register information corresponding to the passthrough device; and (c) enabling the guest operating system to directly access the hardware device corresponding to the passthrough device by providing access to the second configuration register information of the passthrough device.
Abstract:
A source virtual machine (VM) hosted on a source server is migrated to a destination VM on a destination server without first powering down the source VM. After optional pre-copying of the source VM's memory to the destination VM, the source VM is suspended and its non-memory state is transferred to the destination VM; the destination VM is then resumed from the transferred state. In one embodiment, the source VM memory is either paged into the destination VM on demand, or is transferred asynchronously by pre-copying and write-protecting the source VM memory, and then later transferring only the modified pages after the destination VM is resumed. In one embodiment, the source and destination servers share common storage, in which the source VM's virtual disk is stored; this avoids the need to transfer the virtual disk contents.
Abstract:
In a virtualized computer system in which a guest operating system runs on a virtual machine of a virtualized computer system, a computer-implemented method of providing the guest operating system with direct access to a hardware device coupled to the virtualized computer system via a communication interface, the method including: (a) obtaining first configuration register information corresponding to the hardware device, the hardware device connected to the virtualized computer system via the communication interface; (b) creating a passthrough device by copying at least part of the first configuration register information to generate second configuration register information corresponding to the passthrough device; and (c) enabling the guest operating system to directly access the hardware device corresponding to the passthrough device by providing access to the second configuration register information of the passthrough device.
Abstract:
In a virtualized computer system in which a guest operating system runs on a virtual machine of a virtualized computer system, a computer-implemented method of providing the guest operating system with direct access to a hardware device coupled to the virtualized computer system via a communication interface, the method including: (a) obtaining first configuration register information corresponding to the hardware device, the hardware device connected to the virtualized computer system via the communication interface; (b) creating a passthrough device by copying at least part of the first configuration register information to generate second configuration register information corresponding to the passthrough device; and (c) enabling the guest operating system to directly access the hardware device corresponding to the passthrough device by providing access to the second configuration register information of the passthrough device.
Abstract:
A virtualization technique, in accordance with one embodiment of the present invention, includes emulating the small computing system interface (SCSI) protocol to access a virtual SCSI storage device backed by a file stored on network attached storage (NAS).