This research aimed at assessing the efficacy of non-parametric procedures to improve the classification of the ejaculates in the artificial insemination (AI) centers according to their fertility rank predicted from characteristics of the AI doses. A total of 753 ejaculates from 193 bucks were evaluated at three different times from 5 to 9 months of age for 21 seminal variables (related to ejaculate pH and volume, sperm concentration, viability, morphology and acrosome reaction traits, and dose characteristic) and their corresponding fertility score after AI over crossbred females. Fertility rate was categorized into five classes of equal length. Linear Regression (LR), Ordinal Logistic Regression (OLR), Support Vector Regression (SVR), Support Vector Ordinal Regression (SVOR), and Non-deterministic Ordinal Regression (NDOR) were compared in terms of their predictive ability with two base line algorithms: MEAN and MODE which always predict the mean and mode value of the classes observed in the data set, respectively. Predicting ability was measured in terms of rate of erroneous classifications, linear loss (average of the distance between the predicted and the observed classes), the number of predicted classes and the F1 statistic (which allows comparing procedures taking into account that they can predict different number of classes). The seminal traits with a bigger influence on fertility were established using stepwise regression and a nondeterministic classifier. MEAN, LR and SVR produced a higher percentage of wrong classified cases than MODE (taken as reference for this statistic), whereas it was 6%, 13% and 39% smaller for SVOR, OLR and NDOR, respectively. However, NDOR predicted an average of 2.04 classes instead of one class predicted by the other procedures. All the procedures except MODE showed a similar smaller linear loss than the reference one (MEAN) SVOR being the one with the best performance. The NDOR showed the highest value of the F1 statistic. Values of linear loss and F1 statistics were far from their best value indicating that possibly, the variation in fertility explained by this group of semen characteristics is very low. From the total amount of traits included in the full model, 11, 16, 15, 18 and 3 features were kept after performing variable selection with the LR, OLR, SVR, SVOR and NDOR methods, respectively. For all methods, the reduced models showed almost an irrelevant decrease in their predictive abilities compared to the corresponding values obtained with the full models. © 2013 Elsevier B.V.
- Non-parametric methods
- Seminal traits