TY - JOUR
T1 - SID4VAM
T2 - A benchmark dataset with synthetic images for visual attention modeling
AU - Berga, David
AU - Vidal, Xose Ramon Fernandez
AU - Otazu, Xavier
AU - Pardo, Xose M.
N1 - Funding Information:
This work was funded by the MINECO (DPI2017-89867-C2-1-R, TIN2015-71130-REDT), AGAUR (2017-SGR-649), CERCA Programme / Generalitat de Catalunya, in part by Xunta de Galicia under Project ED431C2017/69, in part by the Consellería de Cultura, Educación e Orde-nación Universitaria (accreditation 20162019, ED431G/08) and the European Regional Development Fund, and in part by Xunta de Galicia and the European Union (European Social Fund). We also acknowledge the generous GPU support from NVIDIA.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size...) with a target-distractor pop-out type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
AB - A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size...) with a target-distractor pop-out type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
UR - http://www.scopus.com/inward/record.url?scp=85080952558&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2019.00888
DO - 10.1109/ICCV.2019.00888
M3 - Article
AN - SCOPUS:85080952558
SN - 1550-5499
SP - 8788
EP - 8797
JO - Proceedings of the IEEE International Conference on Computer Vision
JF - Proceedings of the IEEE International Conference on Computer Vision
ER -