Prediction of topsoil organic carbon content with Sentinel-2 imagery and spectroscopic measurements under different conditions using an ensemble model approach with multiple pre-treatment combinations

James Kobina Mensah Biney*, Radim Vašát, Stephen Mackenzie Bell, Ndiye Michael Kebonye, Aleš Klement, Kingsley John, Luboš Borůvka

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer-review


Estimating soil organic carbon (SOC) using visible near infrared (Vis-NIR) spectroscopy has proven to be a rapid and reliable approach. However, when working across large geographical scales, remote sensing may be more suitable. Acquiring these spectra data normally under different measurement conditions could introduce artefacts that reduce SOC prediction accuracy. A common procedure has been using calibration or multivariate techniques in conjunction with one or more pre-treatment algorithms. The results of several comparative studies based on these predictive calibration techniques used alone were inconsistent. Moreover, protocols to select the most appropriate pre-treatment algorithms rarely exist. This study combines predictions from different techniques into a single model based on an ensemble learning approach. The main objective is to improve the accuracy of SOC prediction by assessing the effectiveness of using different calibration techniques individually against an ensemble model consisting of one statistical method, which includes partial least squares regression (PLSR), and three machine learning (ML) algorithms, including random forest (RF), support vector machine regression (SVMR), and Cubist. Several pre-treatment algorithms were also employed to improve the spectral data before prediction. Spectra data were collected from three different agricultural fields (with different soil types), under different spectral measurement conditions (field, wet and dry). Additionally, Sentinel-2 (S2) data was collected from one of these fields. Furthermore, to ascertain the effectiveness of the developed model on regional scale dataset, two options were employed: (1) merged data from all fields, and (2) merged data from fields measured under the same spectral measurement conditions. The models were evaluated using root mean square error of prediction (RMSEPCV), the coefficient of determination (R²CV), the ratio of performance to interquartile range (RPIQ), the ratio of performance to deviation (RPD) and BIAS. The results show that, across the three agricultural fields, the ensemble model predicted SOC more accurately than each of the individual calibration techniques (R2CV = 0.92, RMSEPCV (g/kg) = 1.00, RPD = 3.06, RPIQ = 3.74, BIAS (g/kg) = 0.067). The models derived from merged data (regional dataset) show that the ensemble approach predicted SOC more accurately with option 2 than option 1. Finally, while the ensemble model improves SOC accuracy with S2 data, the final output was poor. Further research to determine the underlying problem is strongly recommended. Nonetheless, these results indicate that the ensemble model is advantageous because it improved the prediction accuracy of SOC and reduced the error margin.

Original languageEnglish
Article number105379
Pages (from-to)105379
JournalSoil and Tillage Research
Publication statusPublished - 1 Jun 2022


  • Agricultural soil
  • Ensemble predictive model
  • Pre-treatment
  • Sentinel-2
  • Soil organic carbon
  • Spectroscopy (field-wet-dry)


Dive into the research topics of 'Prediction of topsoil organic carbon content with Sentinel-2 imagery and spectroscopic measurements under different conditions using an ensemble model approach with multiple pre-treatment combinations'. Together they form a unique fingerprint.

Cite this