TY - JOUR
T1 - CosmoHub: Interactive exploration and distribution of astronomical data on Hadoop
AU - Tallada, P.
AU - Carretero, J.
AU - Casals, J.
AU - Acosta-Silva, C.
AU - Serrano, S.
AU - Caubet, M.
AU - Castander, F. J.
AU - Cesar, E.
AU - Crocce, M.
AU - Delfino, M.
AU - Eriksen, M.
AU - Fosalba, P.
AU - Gaztanaga, E.
AU - Merino, G.
AU - Neissner, C.
AU - Tonello, N.
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - We present CosmoHub (https://cosmohub.pic.es), a web application based on Hadoop to perform interactive exploration and distribution of massive cosmological datasets. Recent Cosmology seeks to unveil the nature of both dark matter and dark energy mapping the large-scale structure of the Universe, through the analysis of massive amounts of astronomical data, progressively increasing during the last (and future) decades with the digitization and automation of the experimental techniques.CosmoHub, hosted and developed at the Port d'Informacio Cientifica (PIC), provides support to a worldwide community of scientists, without requiring the end user to know any Structured Query Language (SQL). It is serving data of several large international collaborations such as the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS) and the Marenostrum Institut de Ciencies de l'Espai (MICE) numerical simulations. While originally developed as a PostgreSQL relational database web frontend, this work describes the current version of CosmoHub, built on top of Apache Hive, which facilitates scalable reading, writing and managing huge datasets. As CosmoHub's datasets are seldomly modified, Hive it is a better fit.Over 60 TiB of cataloged information and 50x10(9) astronomical objects can be interactively explored using an integrated visualization tool which includes 1D histogram and 2D heatmap plots. In our current implementation, online exploration of datasets of 10(9) objects can be done in a timescale of tens of seconds. Users can also download customized subsets of data in standard formats generated in few minutes. (C) 2020 Elsevier B.V. All rights reserved.
AB - We present CosmoHub (https://cosmohub.pic.es), a web application based on Hadoop to perform interactive exploration and distribution of massive cosmological datasets. Recent Cosmology seeks to unveil the nature of both dark matter and dark energy mapping the large-scale structure of the Universe, through the analysis of massive amounts of astronomical data, progressively increasing during the last (and future) decades with the digitization and automation of the experimental techniques.CosmoHub, hosted and developed at the Port d'Informacio Cientifica (PIC), provides support to a worldwide community of scientists, without requiring the end user to know any Structured Query Language (SQL). It is serving data of several large international collaborations such as the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS) and the Marenostrum Institut de Ciencies de l'Espai (MICE) numerical simulations. While originally developed as a PostgreSQL relational database web frontend, this work describes the current version of CosmoHub, built on top of Apache Hive, which facilitates scalable reading, writing and managing huge datasets. As CosmoHub's datasets are seldomly modified, Hive it is a better fit.Over 60 TiB of cataloged information and 50x10(9) astronomical objects can be interactively explored using an integrated visualization tool which includes 1D histogram and 2D heatmap plots. In our current implementation, online exploration of datasets of 10(9) objects can be done in a timescale of tens of seconds. Users can also download customized subsets of data in standard formats generated in few minutes. (C) 2020 Elsevier B.V. All rights reserved.
KW - ASDF
KW - Apache Hadoop
KW - Apache Hive
KW - CHALLENGE LIGHTCONE SIMULATION
KW - Data distribution
KW - Data exploration
KW - FITS
UR - http://www.scopus.com/inward/record.url?scp=85085545614&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/01f73831-c3c9-3f2c-9088-58a84b06366f/
U2 - 10.1016/j.ascom.2020.100391
DO - 10.1016/j.ascom.2020.100391
M3 - Article
SN - 2213-1337
VL - 32
JO - Astronomy and Computing
JF - Astronomy and Computing
M1 - 100391
ER -