Abstract
The problem to solve in this PhD Thesis consists of finding a solution that improves the classification that is obtained at the moment for the problem of the detection of the Down syndrome in fetuses, during the second trimester of pregnancy, with non-invasive techniques.The dataset used for the detection of the Down syndrome is imbalanced and two classes type, that is, that there are a great difference between the number of cases corresponding to fetuses that are not affected by the Down syndrome and those that are affected.
In order to try to improve the classification that is obtained at the present time, a new method of Soft Computing has been developed based on Fuzzy Logic and designed to work with imbalanced datasets. This method allows, not only to find a good solution, but also to extract the acquired knowledge. The developed method is called FLAGID (Fuzzy Logic And Genetic algorithms for Imbalanced Datasets) and it is based on the idea that the solution tries to generalize at maximum, avoiding the overfitting effect that takes place in most of methods when trying to work with an imbalanced dataset. In order to provide the necessary tools to the method that allows it to generalize, an algorithm called ReRecBF has been developed, which is a part of method FLAGID. This algorithm transforms the membership functions obtained from the data by another existing algorithm, called DDA/RecBF. This transformation consists of turning the membership functions generated from the cases of the minor-class in triangular functions, leaving like trapezoidal functions the membership functions of the major-class, dividing the membership functions that are overlapped. Finally, because new membership functions are generated, a genetic algorithm is used simply to find the rules that adjust more to the new functions.
The results obtained have improved the rate of false positives in the data set of the Down syndrome until 4%, with a rate of true positives of 60%. This is the first time that a method is able to achieve a lower rate of 5% of false positives with that rate of successes in the true positives. In addition, the knowledge of the result has been extracted, and this one has agreed, in its majority, with the existing knowledge in the field of the medicine. Another remarkable fact is that it has been verified that the method also is useful to work with imbalanced datasets.
Finally, the results of this work make new contributions in the field of the medicine, like the importance of the gestational age of the fetus in the detection of the positive cases and that the weight of the mother has more indicating importance than simply calibrating both hormonal AFP and hCG.
Date of Award | 23 Jan 2007 |
---|---|
Original language | Spanish |
Supervisor | Jordi Roig de Zárate (Director) & Marta Prim Sabria (Director) |