Open Access   Article Go Back

A Study on Missing Data Management

M. Mitra1 , R.K. Samanta2

Section:Research Paper, Product Type: Journal Paper
Volume-5 , Issue-2 , Page no. 30-33, Feb-2017

Online published on Mar 01, 2017

Copyright © M. Mitra, R.K. Samanta . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: M. Mitra, R.K. Samanta , “A Study on Missing Data Management,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.2, pp.30-33, 2017.

MLA Style Citation: M. Mitra, R.K. Samanta "A Study on Missing Data Management." International Journal of Computer Sciences and Engineering 5.2 (2017): 30-33.

APA Style Citation: M. Mitra, R.K. Samanta , (2017). A Study on Missing Data Management. International Journal of Computer Sciences and Engineering, 5(2), 30-33.

BibTex Style Citation:
@article{Mitra_2017,
author = {M. Mitra, R.K. Samanta },
title = {A Study on Missing Data Management},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {2 2017},
volume = {5},
Issue = {2},
month = {2},
year = {2017},
issn = {2347-2693},
pages = {30-33},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1173},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1173
TI - A Study on Missing Data Management
T2 - International Journal of Computer Sciences and Engineering
AU - M. Mitra, R.K. Samanta
PY - 2017
DA - 2017/03/01
PB - IJCSE, Indore, INDIA
SP - 30-33
IS - 2
VL - 5
SN - 2347-2693
ER -

VIEWS PDF XML
766 622 downloads 550 downloads
  
  
           

Abstract

Missing data, a persistent problem in most scientific research, should be handled very carefully, as role of data are vital in every analysis. Mishandling missing values may cause distorted analysis or may generate biased results. Valid and reliable models require good data preparation. Dozens of techniques have been proposed by methodologists to address the problem. Appropriate method should be taken into consideration for a particular study in order to achieve efficient and valid analysis. In this study we discuss different methods to handle missing data and compare three imputation methods: Arithmetic Mean Imputation, Regression Imputation and Multiple Imputation using EMB algorithm, performed on three data sets from UCI repository under the assumption of MAR based on Root Mean Square Error (RMSE) as an evaluation criteria.

Key-Words / Index Term

UCI database, Missing At Random (MAR), Missing Completely At Random (MCAR), Missing Not At Random (MNAR), Multiple Imputation, Expectation Maximization with Bootstrap approach (EMB), Root Mean Square Error (RMSE)

References

[1] Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, “Missing value estimation methods for dna microarrays”, Bioinformatics Vol.17, pp.520-525, 2001.
[2] Lewis HD, “Missing data in clinical trials”, New England Journal of Medicine, Vol. 367, pp. 2557-2558, 2012.
[3] Rubin DB, “Inference and missing data”, Biometrica Vol. 63, pp. 581-592, 1976.
[4] Little RJA, Rubin DB, Statistical Analysis with Missing Data (2nd edn.), Wiley-Interscience, 2002.
[5] N.Durga, D.Ragupathi and V. Raj Kumar, "Uses of HDFS in Metadata Management System", International Journal of Computer Sciences and Engineering, Vol.2(9), pp.145-150, Sep 2014
[6] Schafer. J. L. & Graham, J.N., “Missing Data: Our view of the state of the art”, Psychological Methods, Vol. 7, pp. 147-177, 2002.
[7] Bhambri V., "Data Mining as a Solution for Data Management in Banking Sector", International Journal of Computer Sciences and Engineering, Vol.1(1), pp.20-25, Sep -2013.
[8] King G, Tomaz M, Wittenberg J, “Making the Most of Statistical Analyses: Improving and Presentation”, American Journal of Political Science, Vol. 44(2), pp. 341-355, 2000.
[9] Dempster A. P., Laird N. M., Rubin D. B., "Maximum Likelihood from Incomplete Data via the EM Algorithm", Journal of the Royal Statistical Society, Vol. 39(1) , pp. 1–38, 1977.
[10] Honaker J., King G., “What to do About Missing Values in Time Series Cross-Section Data”, American J. of Political Science, Vol. 54(2), pp.561-581, 2010.
[11] Horton NJ, Kleinman KP, “Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models”, The American Statistician Vol.61, pp. 79-90, 2007.