A learning based workflow scheduling approach on volatile cloud resources
SAYYED ALI KIAIAN MOUSAVY
Master in Computer Science
Date: Sunday 31st May, 2020
Supervisor: Hamid Reza Faragardi
Examiner: Roberto Guanciele
School of Electrical Engineering and Computer Science
Workflows, originally from the business world, provide a systematic organization to an otherwise chaotic complex process. Therefore, they have become dominant and popular in scientific computation, where complex and broad-scale data analysis and scientific automation are required. In recent years, demand for reliable algorithms for workflow optimization problems, mainly scheduling and resource provisioning have grown considerably. There are various algorithms and proposals to optimize these problems. However, most of these provisioning and algorithms do not account for reliability and robustness, or lack thereof. Besides, those that do require assumptions and handcrafted heuristics with manual parameter assignment to provide solutions. In this thesis, a new workflow scheduling algorithm is proposed that learns the heuristics required for reliability and robustness consideration in a volatile cloud environment, particularly on Amazon EC2 spot instances. Furthermore, the algorithm uses the learned data to propose an efficient scheduling strategy that prioritizes reliability but also considers minimization of execution time. The proposed algorithm mainly improves upon Failure rate and reliability in comparison to the other tested algorithms, such as Heterogeneous Earliest Finish Time(HEFT) and ReplicateAll, while at the same time, maintaining an acceptable degradation in Makespan compared to the vanilla HEFT, making it more sustainable in an unreliable environment as a result. We have discovered that our proposed algorithm performs 5% worse than the baseline HEFT regarding total execution time. However, we realised that it wastes 52% less resources compared to the baseline HEFT and uses 40% less resources compared to the ReplicateAll algorithm as a result of reduced failure rate in the unreliable environment.