Abstract:
A virtualized computing system for executing a distributed computing application, such as Hadoop, is discussed. The virtualized computing system stores data in a distributed filesystem, such as Hadoop Distributed File System, and processes data using a topology awareness that takes into account the virtualization layer of the virtualized computing system. The virtualized computing system employs locality-related policies, including replica placement policies, replica choosing policies, balancer policies, and task scheduling policies that take advantage of the awareness of the virtualization topology.
Abstract:
A method receives physical location information for racks in which application running environments are located. Each rack includes multiple host computing devices in a cluster of host computing devices. Application running environment-rack associations are generated using the physical location information for the cluster where an application running environment-rack association maps an application running environment to a rack. The application running environment-rack associations are provided to the cluster. Then, the method provides a data set for storing in the cluster of hosts where the data set is associated with a placement strategy. The cluster uses the placement strategy to store a data block in the data set for a first application running environment and store a replica data block for a second application running environment at a location in the cluster based on the first application running environment being associated with a first rack from the application running environment-rack associations.
Abstract:
A method receives physical location information for racks in which application running environments are located. Each rack includes multiple host computing devices in a cluster of host computing devices. Application running environment-rack associations are generated using the physical location information for the cluster where an application running environment-rack association maps an application running environment to a rack. The application running environment-rack associations are provided to the cluster. Then, the method provides a data set for storing in the cluster of hosts where the data set is associated with a placement strategy. The cluster uses the placement strategy to store a data block in the data set for a first application running environment and store a replica data block for a second application running environment at a location in the cluster based on the first application running environment being associated with a first rack from the application running environment-rack associations.