TY - JOUR
T1 - Finding, analysing and solving MPI communication bottlenecks in Earth System models
AU - Tintó Prims, Oriol
AU - Castrillo, Miguel
AU - Acosta, Mario C.
AU - Mula-Valls, Oriol
AU - Sanchez Lorente, Alicia
AU - Serradell, Kim
AU - Cortés, Ana
AU - Doblas-Reyes, Francisco J.
PY - 2019/9/1
Y1 - 2019/9/1
N2 - © 2018 The Authors It is a matter of consensus that the ability to efficiently use current and future high performance computing systems is crucial for science, however, the reality is that the performance currently achieved by most of the parallel scientific applications is far from desired. Despite inter-process communication has already been a matter of study in many different works, it is a fact that their recommendations are not taken into account in most of computational model development processes, at least in the case of Earth Science. This work presents a methodology that aims to help scientists working with computational models using inter-process communication, to deal with the difficulties they face when trying to understand their applications behaviour. Following a series of steps that are presented here, both users and developers will learn how to identify performance issues by characterizing applications scalability, identifying which parts present a bad performance and understand the role that inter-process communication plays. In this work, the Nucleus for European Modelling of the Ocean (NEMO), the state-of-the-art European global ocean circulation model, will be used as an example of success. It is a community code widely used in Europe, to the extent that more than a hundred million core hours are used every year in experiments involving NEMO. In the analysis exercise, it is shown how to answer the questions of where, why and what is degrading model's scalability, and how this information can help developers in finding solutions that will mitigate their eventual issues. This document also demonstrates how performance analysis carried out with small size experiments, using limited resources, can lead to optimizations that will impact bigger experiments running on thousands of cores, making it easier to deal with the exascale challenge.
AB - © 2018 The Authors It is a matter of consensus that the ability to efficiently use current and future high performance computing systems is crucial for science, however, the reality is that the performance currently achieved by most of the parallel scientific applications is far from desired. Despite inter-process communication has already been a matter of study in many different works, it is a fact that their recommendations are not taken into account in most of computational model development processes, at least in the case of Earth Science. This work presents a methodology that aims to help scientists working with computational models using inter-process communication, to deal with the difficulties they face when trying to understand their applications behaviour. Following a series of steps that are presented here, both users and developers will learn how to identify performance issues by characterizing applications scalability, identifying which parts present a bad performance and understand the role that inter-process communication plays. In this work, the Nucleus for European Modelling of the Ocean (NEMO), the state-of-the-art European global ocean circulation model, will be used as an example of success. It is a community code widely used in Europe, to the extent that more than a hundred million core hours are used every year in experiments involving NEMO. In the analysis exercise, it is shown how to answer the questions of where, why and what is degrading model's scalability, and how this information can help developers in finding solutions that will mitigate their eventual issues. This document also demonstrates how performance analysis carried out with small size experiments, using limited resources, can lead to optimizations that will impact bigger experiments running on thousands of cores, making it easier to deal with the exascale challenge.
KW - Earth System modelling
KW - MPI optimization
KW - Ocean modelling
KW - Performance analysis
KW - Performance optimization
U2 - https://doi.org/10.1016/j.jocs.2018.04.015
DO - https://doi.org/10.1016/j.jocs.2018.04.015
M3 - Article
SN - 1877-7503
VL - 36
SP - 1
EP - 10
JO - Journal of Computational Science
JF - Journal of Computational Science
M1 - 100864
ER -