Abstract:
Exemplary methods, apparatuses, and systems that can intelligently copy data to a plurality of datastores are described. In one embodiment, a distance value of a path between each datastore is determined. Based on the distance values, a graph cluster analysis creates clusters of the datastores within close proximity to one another. Also, a shortest path tree determines the most efficient paths available for copying data from a source datastore to one or more destination datastores. The source datastore is designated as the root of the shortest path tree, and the one or more destination datastores are designated as the vertices of the tree. After each child vertex of the source datastore is ordered in descending order according to a number of unique clusters to which descendants of the child vertex belong, the data is copied from the source datastore to the one or more destination datastores in the descending order.
Abstract:
Exemplary methods, apparatuses, and systems that can intelligently copy data to a plurality of datastores using performance monitoring are described. In one embodiment, a shortest path tree determines the most efficient paths available for copying data from a source datastore to one or more destination datastores. During the copying of the data between a source datastore and the one or more destination datastores, a performance value of each of the datastores involved in the copying process is compared to a threshold. In response to determining that the performance value of a given source or destination datastore involved in the copying exceeds the threshold, the copying of the data to the corresponding destination datastore is suspended. An updated shortest path tree is determined to locate a more efficient path for copying data to the suspended destination datastore. Copying is resumed to the suspended destination datastore using the updated shortest path tree.
Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Data and compute nodes are separated into different virtual machines (VM). Compute VMs are used to launch containers from different tenants. Compute VMs are organized in pools of hot spare VMs that are immediately available for launching a container and executing a task, and pools of cold spare VMs. Each compute VM may include a mounted network filesystem provided by a node manager to share intermediate outputs across VMs executing on the same host.
Abstract:
In a computer-implemented method for configuring flash cache for input/output operations to a storage device by a plurality of virtual machines an input/output trace log for each of a plurality of virtual machines is accessed. Performance of each of the plurality of virtual machines based on a plurality of configurations of the flash cache is simulated in real-time. A recommendation of the plurality of configurations of the flash cache for the each of the plurality of virtual machines utilizing results from the simulation is generated.
Abstract:
An elastic filesystem for temporary data provides storage space for virtual machines (VMs) in a distributed computing system. The filesystem redirects accesses to virtual disks in VMs to a common pool file. The system provides performance and storage efficiency at least on par with local, direct attached virtual disks, while providing a single pool of shared storage that is provisioned and managed independently of the VMs. The system provides storage isolation between VMs storing temporary data in that shared pool. Also, storage space for temporary data may be allocated on demand and reclaimed when no longer needed, thereby supporting a wide variety of temporary space requirements for different Hadoop jobs.
Abstract:
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Deployments of a distributed computing application, such as Hadoop, may be executed concurrently with a distributed database application, such as HBase, using a shared instance of a distributed filesystem, or in other cases, multiple instances of the distributed filesystem. Computing resources allocated to region server nodes executing as VMs may be isolated from compute VMs of the distributed computing application, as well as from data nodes executing as VMs of the distributed filesystem.