Abstract:
A virtualized computing system for executing a distributed computing application, such as Hadoop, is discussed. The virtualized computing system stores data in a distributed filesystem, such as Hadoop Distributed File System, and processes data using a topology awareness that takes into account the virtualization layer of the virtualized computing system. The virtualized computing system employs locality-related policies, including replica placement policies, replica choosing policies, balancer policies, and task scheduling policies that take advantage of the awareness of the virtualization topology.
Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Data and compute nodes are separated into different virtual machines (VM). Compute VMs are used to launch containers from different tenants. Compute VMs are organized in pools of hot spare VMs that are immediately available for launching a container and executing a task, and pools of cold spare VMs. Each compute VM may include a mounted network filesystem provided by a node manager to share intermediate outputs across VMs executing on the same host.
Abstract:
A method receives physical location information for racks in which application running environments are located. Each rack includes multiple host computing devices in a cluster of host computing devices. Application running environment-rack associations are generated using the physical location information for the cluster where an application running environment-rack association maps an application running environment to a rack. The application running environment-rack associations are provided to the cluster. Then, the method provides a data set for storing in the cluster of hosts where the data set is associated with a placement strategy. The cluster uses the placement strategy to store a data block in the data set for a first application running environment and store a replica data block for a second application running environment at a location in the cluster based on the first application running environment being associated with a first rack from the application running environment-rack associations.
Abstract:
A method receives physical location information for racks in which application running environments are located. Each rack includes multiple host computing devices in a cluster of host computing devices. Application running environment-rack associations are generated using the physical location information for the cluster where an application running environment-rack association maps an application running environment to a rack. The application running environment-rack associations are provided to the cluster. Then, the method provides a data set for storing in the cluster of hosts where the data set is associated with a placement strategy. The cluster uses the placement strategy to store a data block in the data set for a first application running environment and store a replica data block for a second application running environment at a location in the cluster based on the first application running environment being associated with a first rack from the application running environment-rack associations.
Abstract:
A rapid virtual machine (VM) cloning technique is provided that creates cloned VMs on hosts from multiple source VMs, rather than a single source VM that may otherwise be a bottleneck. The described technique selects particular hosts, disposed in particular racks, on which to create VM clones in a dynamic manner that reduces total deployment time for the plurality of VMs. A rapid VM reconfiguration technique is also provided that reduces the time spent reconfiguring the provisioned VMs for use in a distributed computing application.