Comparison of Random Imputation Methods Using R Programming: Case Study of Univariate Data
EJIRO STANLEY OMOKOH *
Department of Mathematics, Western Delta University, Oghara, Nigeria.
EJINKONYE IFEOMA.O
Department of Mathematics, Admiralty University of Nigeria, Ibusu, Delta State, Nigeria.
ADUGE AUGUSTINE
Department of Computer Science, Delta State College of Education, Mosogar, Delta State, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Missing data is a pervasive problem in empirical research that can bias results and reduce statistical power. This manuscript compares univariate random imputation techniques and commonly used alternatives using R programming. We review the theoretical foundations of random imputation approaches, implement several methods in R (random sampling, mean imputation, median imputation, hot-deck, and predictive mean matching via the mice package), and conduct a simulation study to evaluate performance under varying missingness mechanisms (MCAR, MAR, and MNAR), missingness proportions (5%, 10%, 20%, 40%), and sample sizes. Performance metrics include bias, root mean squared error (RMSE), coverage of 95% confidence intervals, and distributional preservation. Practical guidance and reproducible R code snippets are provided to support practitioners. Recent research from 2020–2024 is reviewed to ground recommendations. Results indicate that random sampling imputation is simple and can preserve distributional shape better than mean imputation, but it suffers from increased variance; predictive mean matching (PMM) and hot-deck approaches generally outperform single imputation in terms of bias and RMSE, particularly when data are not missing completely at random. We conclude with recommendations for method selection in univariate contexts and discuss avenues for future research.
Keywords: Monte carlo simulation, algorithms, simulation design, implementation notes