Hybrid Message Pessimistic Logging. Improving current pessimistic message logging protocols

Hugo Meyer, Ronal Muresano, Marcela Castro-León, Dolores Rexachs, Emilio Luque

Research output: Contribution to journalArticleResearchpeer-review

11 Citations (Scopus)

Abstract

© 2017 Elsevier Inc. With the growing scale of HPC applications, there has been an increase in the number of interruptions as a consequence of hardware failures. The remarkable decrease of Mean Time Between Failures (MTBF) in current systems encourages the research of suitable fault tolerance solutions. Message logging combined with uncoordinated checkpoint compose a scalable rollback-recovery solution. However, message logging techniques are usually responsible for most of the overhead during failure-free executions. Taking this into consideration, this paper proposes the Hybrid Message Pessimistic Logging (HMPL) which focuses on combining the fast recovery feature of pessimistic receiver-based message logging with the low failure-free overhead introduced by pessimistic sender-based message logging. The HMPL manages messages using a distributed controller and storage to avoid harming system's scalability. Experiments show that the HMPL is able to reduce overhead by 34% during failure-free executions and 20% in faulty executions when compared with a pessimistic receiver-based message logging.
Original languageEnglish
Pages (from-to)206-222
JournalJournal of Parallel and Distributed Computing
Volume104
DOIs
Publication statusPublished - 1 Jun 2017

Keywords

  • Availability
  • Fault tolerance
  • MPI
  • Message logging
  • Performance
  • Scalability

Fingerprint

Dive into the research topics of 'Hybrid Message Pessimistic Logging. Improving current pessimistic message logging protocols'. Together they form a unique fingerprint.

Cite this