Improving performance on data-intensive applications using a load balancing methodology based on divisible load theory

Claudia Rosas, Anna Sikora, Josep Jorba, Andreu Moreno, Eduardo César

Research output: Contribution to journalArticleResearchpeer-review

5 Citations (Scopus)

Abstract

Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally, these applications can be naturally implemented in parallel but, in many cases, these implementations show severe performance problems mainly due to load imbalances, inefficient use of available resources, and improper data partition policies. It is worth noticing that the problem becomes more complex when the conditions causing these problems change at run time. This paper proposes a methodology for dynamically improving the performance of certain data-intensive applications based on: adapting the size and number of data partitions, and the number of processing nodes, to the current application conditions in homogeneous clusters. To this end, the processing of each exploration is monitored and gathered data is used to dynamically tune the performance of the application. The tuning parameters included in the methodology are: (i) the partition factor of the data set, (ii) the distribution of the data chunks, and (iii) the number of processing nodes to be used. The methodology assumes that a single execution includes multiple related explorations on the same partitioned data set, and that data chunks are ordered according to their processing times during the application execution to assign first the most time consuming partitions. The methodology has been validated using the well-known bioinformatics tool - BLAST - and through extensive experimentation using simulation. Reported results are encouraging in terms of reducing total execution time of the application (up to a 40 % in some cases). © 2012 Springer Science+Business Media, LLC.
Original languageEnglish
Pages (from-to)94-118
JournalInternational Journal of Parallel Programming
Volume42
DOIs
Publication statusPublished - 1 Feb 2014

Keywords

  • Data-intensive
  • Divisible Load Theory
  • Load balancing
  • Performance improvement

Fingerprint Dive into the research topics of 'Improving performance on data-intensive applications using a load balancing methodology based on divisible load theory'. Together they form a unique fingerprint.

  • Cite this