A data-aware multiworkflow scheduler for clusters onworkflowsim

César Acevedo, Porfidio Hernández, Antonio Espinosa, Victor Méndez

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

Most scientific workflows are defined as Direct Acyclic Graphs. Despite DAGs are very expressive to reflect dependencies relationships, current approaches are not aware of the storage physiognomy in terms of performance and capacity. Provide information about temporal storage allocation on data intensive applications helps to avoid performance issues. Nevertheless, we need to evaluate several combinations of data file locations and application scheduling. Simulation is one of the most popular evaluation methods in scientific workflow execution to develop new storage-aware scheduling techniques or improve existing ones, to test scalability and repetitiveness. This paper presents a multiworkflow store-aware scheduler policy as an extension of WorkflowSim, enabling its combination with other WorkflowSim scheduling policies and the possibility of evaluating a wide range of storage and file allocation possibilities. This paper also presents a proof of concept of a real world implementation of a storage-aware scheduler to validate the accuracy of the WorkflowSim extension and the scalability of our scheduler technique. The evaluation on several environments shows promising results up to 69% of makespan improvement on simulated large scale clusters with an error of the WorflowSim extension between 0,9% and 3% comparing with the real infrastructure implementation.

Original languageAmerican English
Pages (from-to)95-102
Number of pages7
JournalCOMPLEXIS 2017 - Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk
Publication statusPublished - 22 Apr 2016

Keywords

  • Cluster
  • Data-aware
  • Multiworkflow
  • Simulation
  • Storage hierarchy

Fingerprint

Dive into the research topics of 'A data-aware multiworkflow scheduler for clusters onworkflowsim'. Together they form a unique fingerprint.

Cite this