Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Data and compute nodes are separated into different virtual machines (VM). Compute VMs are used to launch containers from different tenants. Compute VMs are organized in pools of hot spare VMs that are immediately available for launching a container and executing a task, and pools of cold spare VMs. Each compute VM may include a mounted network filesystem provided by a node manager to share intermediate outputs across VMs executing on the same host.
Abstract:
A virtualized computing system for executing a distributed computing application, such as Hadoop, is discussed. The virtualized computing system stores data in a distributed filesystem, such as Hadoop Distributed File System, and processes data using a topology awareness that takes into account the virtualization layer of the virtualized computing system. The virtualized computing system employs locality-related policies, including replica placement policies, replica choosing policies, balancer policies, and task scheduling policies that take advantage of the awareness of the virtualization topology.
Abstract:
A rapid virtual machine (VM) cloning technique is provided that creates cloned VMs on hosts from multiple source VMs, rather than a single source VM that may otherwise be a bottleneck. The described technique selects particular hosts, disposed in particular racks, on which to create VM clones in a dynamic manner that reduces total deployment time for the plurality of VMs. A rapid VM reconfiguration technique is also provided that reduces the time spent reconfiguring the provisioned VMs for use in a distributed computing application.