TDP-Shell: A generic framework to improve interoperability between batch queue systems and monitoring tools

Vicente J. Ivars*, Miquel A. Senar, Elisa Heymann

*Autor corresponent d’aquest treball

Producció científica: Capítol de llibreCapítolRecercaAvaluat per experts

Resum

Nowadays distributed applications, including MPI implementations, are executed on computer clusters managed by a batch queue system. Users take advantage of monitoring tools to detect run-time problems on their applications running on those environments. But it is a challenge to use monitoring tools on a cluster controlled by a batch queue system. This is due to the fact that batch queue systems and monitoring tools do not coordinate the management of the resources they share, when executing a distributed application. We name this problem lack of interoperability and to solve it we have developed a framework called TDP-Shell. This framework supports different batch queue systems such as Condor and SGE, and different monitoring tools such as Paradyn, Gdb and Total view, without any changes on their source code. In this paper we describe how our basic design of TDP-Shell for sequential applications was re-designed to support the monitoring of MPI applications that are executed on a cluster controlled by a batch queue system.

Idioma originalAmerican English
Títol de la publicacióProceedings - 2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Pàgines522-526
Nombre de pàgines5
DOIs
Estat de la publicacióPublicada - 16 de nov. 2011

Sèrie de publicacions

NomProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (imprès)1552-5244

Fingerprint

Navegar pels temes de recerca de 'TDP-Shell: A generic framework to improve interoperability between batch queue systems and monitoring tools'. Junts formen un fingerprint únic.

Com citar-ho