Most scientific workflows are defined as Direct Acyclic Graphs. Despite DAGs are very expressive to reflect dependencies relationships, current approaches are not aware of the storage physiognomy in terms of performance and capacity. Provide information about temporal storage allocation on data intensive applications helps to avoid performance issues. Nevertheless, we need to evaluate several combinations of data file locations and application scheduling. Simulation is one of the most popular evaluation methods in scientific workflow execution to develop new storage-aware scheduling techniques or improve existing ones, to test scalability and repetitiveness. This paper presents a multiworkflow store-aware scheduler policy as an extension of WorkflowSim, enabling its combination with other WorkflowSim scheduling policies and the possibility of evaluating a wide range of storage and file allocation possibilities. This paper also presents a proof of concept of a real world implementation of a storage-aware scheduler to validate the accuracy of the WorkflowSim extension and the scalability of our scheduler technique. The evaluation on several environments shows promising results up to 69% of makespan improvement on simulated large scale clusters with an error of the WorflowSim extension between 0,9% and 3% comparing with the real infrastructure implementation.
|Number of pages
|COMPLEXIS 2017 - Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk
|Published - 22 Apr 2016
- Storage hierarchy