Abstract:
Multiple computers are connected to a data storage unit that includes a file system, which further includes multiple data entities, including files, directories and the file system itself. The file system also includes, for each data entity, an owner field for indicating which computer, if any, has exclusive or shared access to the data entity, along with a time field for indicating when a lease of the data entity began. When a computer wants to lease a data entity, the computer uses a disk reservation capability to temporarily lock the data storage unit, and, if the data entity is not currently leased, the computer writes its own identification value into the owner field and a current time into the time field for the data entity, to claim the data entity for a renewable lease period. If a prior lease of a data entity has expired, another computer may break the lease and claim ownership for itself.
Abstract:
Techniques for creating a fault tolerant system in a virtual machine environment utilize a primary VM and a backup VM. To initialize the fault tolerant system, the backup VM and primary VM start from the same state. To achieve this in one embodiment, the primary VM is suspended and the state of the primary VM is copied to the backup VM. Once the backup VM has received all the primary VM's state, the primary VM is resumed. Subsequent state changes of the primary VM are buffered until the backup VM resumes, connects to the primary VM, and starts consuming the buffered content. Thereafter, synchronization is maintained by the primary VM's writing relevant state changes to a log and the backup VM's reading such relevant state changes from the log.
Abstract:
To generate a checkpoint for a virtual machine (VM), first, while the VM is still running, a copy-on-write (COW) disk file is created pointing to a parent disk file that the VM is using. Next, the VM is stopped, the VM's memory is marked COW, the device state of the VM is saved to memory, the VM is switched to use the COW disk file, and the VM begins running again for substantially the remainder of the checkpoint generation. Next, the device state that was stored in memory and the unmodified VM memory pages are saved to a checkpoint file. Also, a copy may be made of the parent disk file for retention as part of the checkpoint, or the original parent disk file may be retained as part of the checkpoint. If a copy of the parent disk file was made, then the COW disk file may be committed to the original parent disk file.
Abstract:
An elastic filesystem for temporary data provides storage space for virtual machines (VMs) in a distributed computing system. The filesystem redirects accesses to virtual disks in VMs to a common pool file. The system provides performance and storage efficiency at least on par with local, direct attached virtual disks, while providing a single pool of shared storage that is provisioned and managed independently of the VMs. The system provides storage isolation between VMs storing temporary data in that shared pool. Also, storage space for temporary data may be allocated on demand and reclaimed when no longer needed, thereby supporting a wide variety of temporary space requirements for different Hadoop jobs.
Abstract:
To generate a checkpoint for a virtual machine (VM), first, while the VM is still running, a copy-on-write (COW) disk file is created pointing to a parent disk file that the VM is using. Next, the VM is stopped, the VM's memory is marked COW, the device state of the VM is saved to memory, the VM is switched to use the COW disk file, and the VM begins running again for substantially the remainder of the checkpoint generation. Next, the device state that was stored in memory and the unmodified VM memory pages are saved to a checkpoint file. Also, a copy may be made of the parent disk file for retention as part of the checkpoint, or the original parent disk file may be retained as part of the checkpoint. If a copy of the parent disk file was made, then the COW disk file may be committed to the original parent disk file.
Abstract:
A virtualization platform provides fault tolerance for a primary virtual machine by continuously transmitting checkpoint information of the primary virtual machine to a collector process, such as a backup virtual machine. When implemented on a hardware platform comprising a multi-processor that supports nested page tables, the virtualization platform leverages the nested page table support to quickly identify memory pages that have been modified between checkpoints. The backup virtual machine provides feedback information to assist the virtualization platform in identifying candidate memory pages for transmitting actual modifications to the memory pages rather than the entire memory page as part of the checkpoint information. The virtualization platform further maintains a modification history data structure to identify memory pages that can be transmitted simultaneous with the execution of the primary virtual machine rather than while the primary virtual machine has been stunned.
Abstract:
A method of acquiring a lock by a node, on a shared resource in a system of a plurality of interconnected nodes, is disclosed. Each node that competes for a lock on the shared resource maintains a list of locks currently owned by the node. A lock metadata is maintained on a shared storage that is accessible to all nodes that may compete for locks on shared resources. A heartbeat region is maintained on a shared resource corresponding to each node so nodes can register their liveness. A lock state is maintained in the lock metadata in the shared storage. A lock state may indicate lock held exclusively, lock free or lock in managed mode. If the lock is held in the managed mode, the ownership of the lock can be transferred to another node without a use of a mutual exclusion primitive such as the SCSI reservation.
Abstract:
A virtualized computer system provides fault tolerant operation of a primary virtual machine. In one embodiment, this system includes a backup computer system that stores a snapshot of the primary virtual machine and a log file containing non-deterministic events occuring in the instruction stream of the primary virtual machine. The primary virtual machine periodically updates the snapshot and the log file. Upon a failure of the primary virtual machine, the backup computer can instantiate a failover backup virtual machine by consuming the stored snapshot and log file.
Abstract:
A virtualized computer system provides fault tolerant operation of a primary virtual machine. In one embodiment, this system includes a backup computer system that stores a snapshot of the primary virtual machine and a log file containing non-deterministic events occuring in the instruction stream of the primary virtual machine. The primary virtual machine periodically updates the snapshot and the log file. Upon a failure of the primary virtual machine, the backup computer can instantiate a failover backup virtual machine by consuming the stored snapshot and log file.