The aim of the paper is two-fold. First, we show that structure finding with the PC algorithm can be inherently unstable and requires further operational constraints in order to consistently obtain models that are faithful to the data. We propose a methodology to stabilise the structure finding process, minimising both false positive and false negative error rates. This is demonstrated with synthetic data. Second, to apply the proposed structure finding methodology to a data set comprising single-voxel Magnetic Resonance Spectra of normal brain and three classes of brain tumours, to elucidate the associations between brain tumour types and a range of observed metabolites that are known to be relevant for their characterisation. The data set is bootstrapped in order to maximise the robustness of feature selection for nominated target variables. Specifically, Conditional Independence maps (CI-maps) built from the data and their derived Bayesian networks have been used. A Directed Acyclic Graph (DAG) is built from CI-maps, being a major challenge the minimization of errors in the graph structure. This work presents empirical evidence on how to reduce false positive errors via the False Discovery Rate, and how to identify appropriate parameter settings to improve the False Negative Reduction. In addition, several node ordering policies are investigated that transform the graph into a DAG. The obtained results show that ordering nodes by strength of mutual information can recover a representative DAG in a reasonable time, although a more accurate graph can be recovered using a random order of samples at the expense of increasing the computation time.
|Original language||American English|
|Publication status||Published - Jul 2020|