TY - JOUR
T1 - Theoretical tuning of the autoencoder bottleneck layer dimension
T2 - A mutual information-based algorithm
AU - Boquet, Guillem
AU - Macias, Edwar
AU - Morell, Antoni
AU - Serrano, Javier
AU - Vicario, Jose Lopez
N1 - Funding Information:
This research is supported by the Catalan Government under Project 2017 SGR 1670 and the Spanish Government under Project TEC2017-84321-C4-4-R co-funded with European Union ERDF funds.
Publisher Copyright:
© 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/1/24
Y1 - 2021/1/24
N2 - Under the transportation field, the literature states that forecasting with excessive number of features can be computational inefficient and undertakes the risk of over-fitting. Because of that, several authors proposed the use of autoencoders (AE) as a way of learning fewer but useful features to enhance the road traffic forecast. Notably, the adequacy of the bottleneck layer dimension of the AE has not been addressed, thus there is no standard way for automatic selection of the dimensionality. We address the problem from an information theory perspective as the reconstruction error is not a reliable indicator of the performance of the subsequent supervised learning algorithm. Hence, we propose an algorithm based on how mutual information and entropy of data evolve during training of the AE. We validate it against two real-world traffic datasets and provide discussion why the entropy of codes is a reliable performance indicator. Compared to the tendency found in the literature, based on trial-and-error methods, the advantage of our proposal is that a practitioner can efficiently find said dimension guaranteeing maximal data compression and reliable traffic forecast.
AB - Under the transportation field, the literature states that forecasting with excessive number of features can be computational inefficient and undertakes the risk of over-fitting. Because of that, several authors proposed the use of autoencoders (AE) as a way of learning fewer but useful features to enhance the road traffic forecast. Notably, the adequacy of the bottleneck layer dimension of the AE has not been addressed, thus there is no standard way for automatic selection of the dimensionality. We address the problem from an information theory perspective as the reconstruction error is not a reliable indicator of the performance of the subsequent supervised learning algorithm. Hence, we propose an algorithm based on how mutual information and entropy of data evolve during training of the AE. We validate it against two real-world traffic datasets and provide discussion why the entropy of codes is a reliable performance indicator. Compared to the tendency found in the literature, based on trial-and-error methods, the advantage of our proposal is that a practitioner can efficiently find said dimension guaranteeing maximal data compression and reliable traffic forecast.
KW - Autoencoder
KW - Entropy
KW - Intelligent transportation systems
KW - Mutual information
KW - Traffic forecasting
UR - http://www.scopus.com/inward/record.url?scp=85099316609&partnerID=8YFLogxK
U2 - 10.23919/eusipco47968.2020.9287226
DO - 10.23919/eusipco47968.2020.9287226
M3 - Article
AN - SCOPUS:85099316609
SN - 2219-5491
SP - 1512
EP - 1516
JO - European Signal Processing Conference
JF - European Signal Processing Conference
ER -