Abstract:
A distributed shared log storage system employs an adapter that translates APIs for a big data application to APIs of the distributed shared log storage system. An instance of an adapter is configured for different big data applications in accordance with a profile thereof, so that the big data applications can take on a variety of added characteristics to enhance the application and/or to improve the performance of the application. Included in the added characteristics are global or local ordering of operations, replication of operations according to different replication models, making the operations atomic and caching.
Abstract:
A distributed shared log storage system employs an adapter that translates APIs for a big data application to APIs of the distributed shared log storage system. The adapter is configured for different big data applications in accordance with a profile thereof, so that storage performance using the distributed shared log storage system can be comparable to the storage performance of the profiled big data application. An over-utilized adapter instance is detected and the workload assigned to the over-utilized adapter instance is either moved to a different adapter instance that can handle the workload or split among two or more adapter instances.
Abstract:
A control module is introduced to communicate with an application workload scheduler of a distributed computing application, such as a Job Tracker node of a Hadoop cluster, and with the virtualized computing environment underlying the application. The control module periodically queries for resource consumption data, such as CPU utilization, and uses the data to calculate how MapReduce task slots should be allocated on each task node of the Hadoop cluster. The control module passes the task slot allocation to the application workload scheduler, which honors the allocation by adjusting task assignments to task nodes accordingly. The task nodes may also activate and deactivate task slots according to the changed slot allocation. As a result, the distributed computing application is able to scale up and down when other workloads sharing the virtualized computing environment change.