In recent decades, physicists and astronomers have significantly transformed their methodology
for investigating the universe's content and evolution. Advanced computing techniques
have emerged as indispensable tools to manage the substantial data amassed by
contemporary automated telescopes and highly sensitive instruments. Extracting scientific
insights from the vast information pool necessitates interdisciplinary collaboration
among mechanical and electronic engineers, physicists, astronomers, computer scientists,
and software engineers.
This PhD thesis explores the interface of Computer Science and Cosmology within the
Port d'Informació Científica (PIC), a High Throughput Computing (HTC) data center.
The work encompasses two core domains: (comprehensive) data management and the
advancement of (complex) algorithms for cosmological simulations.
In the realm of data management, conventional tools like relational databases are usually
employed. In this work, a pioneering stance is taken towards them, exemplified by
their central role in the Physics of the Accelerating Universe Survey (PAUS). The design
of a comprehensive data management infrastructure within the tight constraints of PAUS
is the first contribution in this thesis.
Moreover, given the limitations of relational databases in handling extensive data and
evolving usage patterns, this study also delves into alternatives. The challenges in the
distribution of cosmological catalogs within the PAUS collaboration lead to the adoption of
the Apache Hadoop ecosystem. This investigation culminated in the creation of CosmoHub,
an application leveraging Apache Hive -an unprecedented endeavor within astronomy and
cosmology- that promotes Open Science principles.
Concurrently, in the domain of algorithm development for cosmological simulations,
this thesis describes the effort in developing, optimizing and calibrating an algorithm for
the simulation of observed galaxy electromagnetic fluxes. This algorithm, integrated into
a much larger set of Python modules within a Spark-driven pipeline operating on a Hadoop
cluster, is crucial to the creation of the most extensive and comprehensive virtual galaxy
catalogs, serving the European Space Agency's Euclid project.
| Date of Award | 12 Apr 2024 |
|---|
| Original language | English |
|---|
| Supervisor | Nadia Tonello (Director), Jorge Carretero Palacios (Director) & Eduardo Cesar Galobardes (Director) |
|---|
Massive cosmological data generation and distribution
Tallada Crespi, P. (Author). 12 Apr 2024
Student thesis: Doctoral thesis
Tallada Crespi, P. (Author), Tonello, N. (Director), Carretero Palacios, J. (Director) &
Cesar Galobardes, E. (Director),
12 Apr 2024Student thesis: Doctoral thesis
Student thesis: Doctoral thesis