A data-aware multiworkflow scheduler for clusters onworkflowsim

César Acevedo, Porfidio Hernández, Antonio Espinosa, Victor Méndez

Producción científica: Contribución a una revistaArtículoInvestigaciónrevisión exhaustiva

2 Citas (Scopus)

Resumen

Most scientific workflows are defined as Direct Acyclic Graphs. Despite DAGs are very expressive to reflect dependencies relationships, current approaches are not aware of the storage physiognomy in terms of performance and capacity. Provide information about temporal storage allocation on data intensive applications helps to avoid performance issues. Nevertheless, we need to evaluate several combinations of data file locations and application scheduling. Simulation is one of the most popular evaluation methods in scientific workflow execution to develop new storage-aware scheduling techniques or improve existing ones, to test scalability and repetitiveness. This paper presents a multiworkflow store-aware scheduler policy as an extension of WorkflowSim, enabling its combination with other WorkflowSim scheduling policies and the possibility of evaluating a wide range of storage and file allocation possibilities. This paper also presents a proof of concept of a real world implementation of a storage-aware scheduler to validate the accuracy of the WorkflowSim extension and the scalability of our scheduler technique. The evaluation on several environments shows promising results up to 69% of makespan improvement on simulated large scale clusters with an error of the WorflowSim extension between 0,9% and 3% comparing with the real infrastructure implementation.

Idioma originalInglés estadounidense
Páginas (desde-hasta)95-102
Número de páginas7
PublicaciónCOMPLEXIS 2017 - Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk
EstadoPublicada - 22 abr 2016

Huella

Profundice en los temas de investigación de 'A data-aware multiworkflow scheduler for clusters onworkflowsim'. En conjunto forman una huella única.

Citar esto