Since the work of Little and Rubin (1987) not substantial advances in the analysis of explanatory regression models for incomplete data with missing not at random have been achieved, mainly due to the difficulty of verifying the randomness of the unknown data. In practice, the analysis of nonrandom missing data is done with techniques designed for datasets with random or completely random missing data, as complete case analysis, mean imputation, regression imputation, maximum likelihood or multiple imputation. However, the data conditions required to minimize the bias derived from an incorrect analysis have not been fully determined. In the present work, several Monte Carlo simulations have been carried out to establish the best strategy of analysis for random missing data applicable in datasets with nonrandom missing data. The factors involved in simulations are sample size, percentage of missing data, predictive power of the imputation model and existence of interaction between predictors. The results show that the smallest bias is obtained with maximum likelihood and multiple imputation techniques, although with low percentages of missing data, absence of interaction and high predictive power of the imputation model (frequent data structures in research on child and adolescent psychopathology) acceptable results are obtained with the simplest regression imputation.
- Incomplete maximum likelihood estimation
- Monte Carlo simulation
- Multiple imputation
- Nonrandom missing data
- Regression analysis