TY - CHAP
T1 - Analyzing the Influence of File Formats on I/O Patterns in Deep Learning
AU - Leon Otero, Betzabeth del Carmen
AU - Parraga Pinzon, Edixon Alexander
AU - Mendez , Sandra Adriana
AU - Rexachs, Dolores
AU - Suppi, Remo
AU - Luque, Emilio
PY - 2025/3/26
Y1 - 2025/3/26
N2 - Deep Learning applications have become an important solution for analyzing and making predictions with massive amounts of data in recent years. However, this type of application introduces significant input/output (I/O) loads on computer systems. Moreover, when executed on distributed systems or parallel distributed memory systems, they handle much information that must be read during training. This persistent and continuous access to files can overwhelm file systems and negatively impact application performance. A file format defines how information is stored, and the choice of a format depends on the use case. Therefore, it is important to analyze how the file format influences the training stage when loading and reading the dataset, as opening and reading many small files could affect application performance. Thus, this paper will analyze the I/O pattern of different file formats used in deep learning applications to characterize their behavior.
AB - Deep Learning applications have become an important solution for analyzing and making predictions with massive amounts of data in recent years. However, this type of application introduces significant input/output (I/O) loads on computer systems. Moreover, when executed on distributed systems or parallel distributed memory systems, they handle much information that must be read during training. This persistent and continuous access to files can overwhelm file systems and negatively impact application performance. A file format defines how information is stored, and the choice of a format depends on the use case. Therefore, it is important to analyze how the file format influences the training stage when loading and reading the dataset, as opening and reading many small files could affect application performance. Thus, this paper will analyze the I/O pattern of different file formats used in deep learning applications to characterize their behavior.
KW - Distributed Deep Learning
KW - I/O Analysis
KW - Parallel I/O
UR - https://portalrecerca.uab.cat/en/publications/e66622ac-a23c-44fd-8c5f-2bf2be8e0c80
UR - http://www.scopus.com/inward/record.url?scp=105002028449&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/1247bce1-5f3e-3383-a070-2af1b15d2974/
U2 - 10.1007/978-3-031-85638-9_10
DO - 10.1007/978-3-031-85638-9_10
M3 - Chapter
SN - 978-3-031-85638-9
SN - 978-3-031-85637-2
VL - 2256
T3 - Communications in Computer and Information Science
SP - 130
EP - 136
BT - Communications in Computer and Information Science. CSCE 2024.
ER -