The functional characterization of genes involved in many complex traits (phenotypes) of plants, animals, or humans can be studied from a computational point of view using different tools. We propose prediction-from the machine learning point of view-to search for the genetic basis of these traits. However, trying to predict an exact value of a phenotype can be too difficult to obtain a confident model, but predicting an approximation, in the form of an interval of values, can be easier. We shall see that trustable and useful models can be obtained from this relaxed formulation. These predictors may be built as extensions of conventional classifiers or regressors. Although the prediction performance in both cases are similar, we show that, from the classification field, it is straightforward to obtain a principled and scalable method to select a reduced set of features in these genetic learning tasks. We conclude by comparing the results so achieved in a real-world data set of barley plants with those obtained with state-of-the-art methods used in the biological literature. © Mary Ann Liebert, Inc.
- machine learning