Anomaly detection for machine learning redshifts applied to SDSS galaxies

Ben Hoyle, Markus Michae Rau, Kerstin Paech, Christopher Bonnett, Stella Seitz, Jochen Weller

    Research output: Contribution to journalArticleResearchpeer-review

    18 Citations (Scopus)

    Abstract

    © 2015 The Authors Published by Oxford University Press on behalf of the Royal Astronomical Society. We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million 'clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 'anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed 'anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.
    Original languageEnglish
    Pages (from-to)4183-4194
    JournalMonthly Notices of the Royal Astronomical Society
    Volume452
    Issue number4
    DOIs
    Publication statusPublished - 1 Jan 2015

    Keywords

    • Catalogues
    • Galaxies: distances and redshifts
    • Surveys

    Fingerprint Dive into the research topics of 'Anomaly detection for machine learning redshifts applied to SDSS galaxies'. Together they form a unique fingerprint.

    Cite this