Bayesian neural networks to predict again and disease risk.

Student thesis: Doctoral thesis


There has been an increased focus on personalized medicine in recent times. Significant technological improvements in the last few decades generating an explosion in the data available has been one the drivers of the expansion of this field. For instance, the amount of DNA methylation data as well as SNPs data available has increased very substantially. This dissertation focuses on the developments of techniques for analyzing that data applied to the field of aging as well of illness detection, more specifically for cancer and diabetes identification. It will be shown that using a two-step approach consisting of a first stage in which the dimensionality of the data is reduced using algorithms such as Elastic Net, followed with a robust forecasting techniques such as Bayesian Neural Networks is a viable option generating accurate forecast. Other algorithm were also used for illness detection such as Support Vector Machines as well as K-Nearest Neighbors. This dissertation can be divided into three main sections with the first section covering the topic of biological clocks using DNA methylation data and the previously mentioned reduction of dimensionality combined with Bayesian Neural Networks. The biological clock presented in this dissertation generates age forecasts that are more accurate than some well-known existing clocks. This improvement is accomplished by using a non-linear algorithm. The second section covers the issue of cancer identification using, as in the previous case, DNA methylation data and Support Vector Machines as well as K-nearest Neighbor algorithm. It will be shown that for a large amount of different types of cancer, such as lung, colon, cervical or bladder the usage of DNA methylation data in conjunction with SVM generate accurate forecasts. Other algorithms, such as for instance K-Nearest Neighbors, were also used for cancer detection purposes. The last section cover the study of diabetes using in this case SNPs data and Bayesian Neural Networks that also generates accurate diabetes detection. Given the ever increasing amount of DNA methylation data as well as SNPs data available as well as advances in data storage there is an increasing need to have more suitable and sophisticated methods for analyzing such data. One of the base assumptions in this dissertation is that the relationship between DNA methylation and aging and cancer as well as between SNPs and diabetes do not necessarily need to follow a linear model and hence non-linear models, such as Bayesian Neural Networks, can generate more accurate results. It will be shown that this is the case with models generating fairly accurate outcomes.
Date of Award15 Nov 2019
Original languageEnglish
SupervisorJuan Ramon Gonzalez Ruiz (Director), Mauro Santos Maroño (Tutor), Mauro Santos Maroño (Director) & Mario Caceres Aguilar (Director)

Cite this