Analyzing the I/O Patterns of Deep Learning Applications

Edixon Párraga*, Betzabeth León, Román Bond, Diego Encinas, Aprigio Bezerra, Sandra Mendez, Dolores Rexachs, Emilio Luque

*Corresponding author for this work

Research output: Chapter in BookChapterResearchpeer-review

Abstract

A traditional HPC storage system is designed to manage an I/O workload dominated by write operation bursts, mainly for applications carrying out simulations and checkpointing partial results. Currently, this context is more diverse because of artificial intelligence applications’ workload, such as machine learning and deep learning. As ML/DL applications are becoming more compute-intensive, they require the power of HPC systems. However, the HPC I/O system could be a bottleneck to scaling these kind of applications, mainly in the training stage. In this paper, we present a methodology for analyzing the I/O patterns of deep learning applications that allows us to understand the DL applications’ I/O in HPC systems. We have applied our approach to serial and distributed DL codes by using the TensorFlow2 and PyTorch framework for the MNIST and CIFAR-10 datasets.

Original languageEnglish
Title of host publicationCloud Computing, Big Data and Emerging Topics - 9th Conference, JCC-BDandET 2021, Proceedings
EditorsMarcelo Naiouf, Enzo Rucci, Franco Chichizola, Laura De Giusti
PublisherSpringer Science and Business Media Deutschland GmbH
Pages3-16
Number of pages14
ISBN (Print)9783030848248
DOIs
Publication statusPublished - 16 Aug 2021

Publication series

NameCommunications in Computer and Information Science
Volume1444 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Keywords

  • Deep learning
  • Distributed DL
  • I/O HPC
  • I/O Patterns

Fingerprint

Dive into the research topics of 'Analyzing the I/O Patterns of Deep Learning Applications'. Together they form a unique fingerprint.

Cite this