Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Production, test, and development deployments of a Hadoop application may be executed using multiple compute clusters and a shared instance of a distributed filesystem, or in other cases, multiple instances of the distributed filesystem. Data nodes executing as virtual machines (VMs) for test and development deployments can be linked clones of data nodes executing as VMs for a production deployment to reduce duplicated data and provide a shared storage space.
Abstract:
Techniques for handling inheritance of disk state when forking virtual machines (VMs) are provided. In one embodiment, a computer system can receive a request to fork a child VM from a parent VM. In response, the computer system can take a disk snapshot of the parent VM, where the disk snapshot results in a child disk for the child VM, where the child disk is a delta disk that points to a parent disk of the parent VM, and where the parent disk serves as the parent VM's current running point. The computer system can then determine whether the parent disk is a delta disk. If so, the computer system can copy the content of the parent disk to the child disk, traverse a disk hierarchy associated with the parent disk to identify a base disk above the parent disk in the hierarchy, and cause the child disk to point directly to the base disk.
Abstract:
Embodiments support instant forking of virtual machines (VMs) and state customization. Virtual device state and persistent storage of a child VM are defined based on virtual device state and persistent storage of parent VMs. After forking, a state of the child VM is customized based on configuration data. Customizing the state includes configuring one or more identities of the child VM, before bootup completes on the child VM.
Abstract:
Systems and techniques are described for using virtual machines to write parallel and distributed applications. One of the techniques includes receiving a job request, wherein the job request specifies a first job to be performed by a plurality of a special purpose virtual machines, wherein the first job includes a plurality of tasks; selecting a parent special purpose virtual machine from a plurality of parent special purpose virtual machines to perform the first job; instantiating a plurality of child special purpose virtual machines from the selected parent special purpose virtual machine; partitioning the plurality of tasks among the plurality of child special purpose virtual machines by assigning one or more of the plurality of tasks to each of the child special purpose virtual machines; and performing the first job by causing each of the child special purpose virtual machines to execute the tasks assigned to the child special purpose virtual machine.
Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Data and compute nodes are separated into different virtual machines (VM). Compute VMs are used to launch containers from different tenants. Compute VMs are organized in pools of hot spare VMs that are immediately available for launching a container and executing a task, and pools of cold spare VMs. Each compute VM may include a mounted network filesystem provided by a node manager to share intermediate outputs across VMs executing on the same host.
Abstract:
Systems and techniques are described for using virtual machines to write parallel and distributed applications. One of the techniques includes receiving a job request, wherein the job request specifies a first job to be performed by a plurality of a special purpose virtual machines, wherein the first job includes a plurality of tasks; selecting a parent special purpose virtual machine from a plurality of parent special purpose virtual machines to perform the first job; instantiating a plurality of child special purpose virtual machines from the selected parent special purpose virtual machine; partitioning the plurality of tasks among the plurality of child special purpose virtual machines by assigning one or more of the plurality of tasks to each of the child special purpose virtual machines; and performing the first job by causing each of the child special purpose virtual machines to execute the tasks assigned to the child special purpose virtual machine.