Open Access   Article Go Back

Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model

Sanjib Saha1

Section:Research Paper, Product Type: Journal Paper
Volume-11 , Issue-01 , Page no. 184-189, Nov-2023

Online published on Nov 30, 2023

Copyright © Sanjib Saha . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Sanjib Saha, “Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model,” International Journal of Computer Sciences and Engineering, Vol.11, Issue.01, pp.184-189, 2023.

MLA Style Citation: Sanjib Saha "Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model." International Journal of Computer Sciences and Engineering 11.01 (2023): 184-189.

APA Style Citation: Sanjib Saha, (2023). Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model. International Journal of Computer Sciences and Engineering, 11(01), 184-189.

BibTex Style Citation:
@article{Saha_2023,
author = {Sanjib Saha},
title = {Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2023},
volume = {11},
Issue = {01},
month = {11},
year = {2023},
issn = {2347-2693},
pages = {184-189},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=1431},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=1431
TI - Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model
T2 - International Journal of Computer Sciences and Engineering
AU - Sanjib Saha
PY - 2023
DA - 2023/11/30
PB - IJCSE, Indore, INDIA
SP - 184-189
IS - 01
VL - 11
SN - 2347-2693
ER -

           

Abstract

Cluster analysis, in unsupervised learning, divides similar data into groups or clusters that are meaningful and useful. Due to good performance in clustering on massive data sets K-Means clustering is feasible in multiple areas of science and technology. The clustering algorithms may face problems of empty clusters and incomplete data. This empty cluster problem is caused by bad initialization of the center point and this may route to signifying performance degradation. In this theme, the K-Means clustering algorithm is revisited from the probabilistic viewpoint and reformed by the similarity among the K-Means and finite Gaussian Mixture Model (GMM). The initial centroids or current best estimate for the parameters are calculated from the list of all data, known and unknown. Therefore, any two or more primal centroids may not be equal or not very close to each other and data will be assigned to the appropriate clusters with closely fair centroids. The newly proposed modified K-Means using GMM of the Expectation Maximization approach efficiently eliminate the empty cluster and incomplete data problems.

Key-Words / Index Term

Unsupervised Learning, Clustering Analysis, K-Means, Expectation Maximization, Gaussian Mixture Model

References

[1] MacQueen, J. "Classification and analysis of multivariate observations." 5th Berkeley Symp. Math. Statist. Probability. Los Angeles LA USA: University of California, 1967.
[2] Reynolds, Douglas A. "Gaussian mixture models." Encyclopedia of biometrics 741, pp.659-663, 2009.
[3] Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the royal statistical society: series B (methodological) 39.1: pp.1-22, 1977.
[4] Bradley, Paul S., and Usama M. Fayyad. "Refining initial points for k-means clustering." ICML. Vol.98, 1998.
[5] Pakhira, Malay K. "A modified k-means algorithm to avoid empty clusters." International Journal of Recent Trends in Engineering 1.1: 220, 2009.
[6] Yang, Miin-Shen, Chien-Yo Lai, and Chih-Ying Lin. "A robust EM clustering algorithm for Gaussian mixture models." Pattern Recognition 45.11: pp.3950-3961, 2012.
[7] McLachlan, Geoffrey J., and Suren Rathnayake. "On the number of components in a Gaussian mixture model." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4.5: pp.341-355, 2014.
[8] Huang, Tao, Heng Peng, and Kun Zhang. "Model selection for Gaussian mixture models." Statistica Sinica: pp.147-169, 2017.
[9] Patel, Eva, and Dharmender Singh Kushwaha. "Clustering cloud workloads: K-means vs gaussian mixture model." Procedia Computer Science 171: pp.158-167, 2020.
[10] Androniceanu, Armenia, Jani Kinnunen, and Irina Georgescu. "E-Government clusters in the EU based on the Gaussian Mixture Models." Administratie si Management Public 35: pp.6-20, 2020.
[11] Löffler, Matthias, Anderson Y. Zhang, and Harrison H. Zhou. "Optimality of spectral clustering in the Gaussian mixture model." The Annals of Statistics 49.5: pp.2506-2530, 2021.
[12] Chen, Yongxin, Tryphon T. Georgiou, and Allen Tannenbaum. "Optimal transport for Gaussian mixture models." IEEE Access 7: pp.6269-6278, 2018.
[13] Viroli, Cinzia, and Geoffrey J. McLachlan. "Deep Gaussian mixture models." Statistics and Computing 29: pp.43-51, 2019.
[14] Yuan, Wentao, et al. "Deepgmr: Learning latent gaussian mixture models for registration." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. Springer International Publishing, 2020.
[15] Shahin, Ismail, Ali Bou Nassif, and Shibani Hamsa. "Emotion recognition using hybrid Gaussian mixture model and deep neural network." IEEE access 7: pp.26777-26787, 2019.
[16] Zong, Bo, et al. "Deep autoencoding gaussian mixture model for unsupervised anomaly detection." International conference on learning representations. 2018.
[17] An, Peng, Zhiyuan Wang, and Chunjiong Zhang. "Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection." Information Processing & Management 59.2 (2022): 102844.
[18] Ding, Nan, et al. "Real-time anomaly detection based on long short-Term memory and Gaussian Mixture Model." Computers & Electrical Engineering 79 (2019): 106458.
[19] Wan, Huan, et al. "A novel Gaussian mixture model for classification." 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE, 2019.
[20] Fu, Yinlin, et al. "Gaussian mixture model with feature selection: An embedded approach." Computers & Industrial Engineering 152 (2021): 107000.
[21] Singhal, Amit, et al. "Modeling and prediction of COVID-19 pandemic using Gaussian mixture model." Chaos, Solitons & Fractals 138 (2020): 110023.
[22] Zhu, Weiqiang, et al. "Earthquake phase association using a Bayesian Gaussian mixture model." Journal of Geophysical Research: Solid Earth 127.5 (2022): e2021JB023249.
[23] Datta, R. P., and Sanjib Saha. "Applying rule-based classification techniques to medical databases: an empirical study." International Journal of Business Intelligence and Systems Engineering 1.1: pp.32-48, 2016.
[24] Das, Subhankar, and Sanjib Saha. "Data mining and soft computing using support vector machine: A survey." International Journal of Computer Applications 77.14, 2013.
[25] Saha, Sanjib, and Debashis Nandi. "Data Classification based on Decision Tree, Rule Generation, Bayes and Statistical Methods: An Empirical Comparison." Int. J. Comput. Appl 129.7: pp.36-41, 2015.
[26] Saha, Sanjib. "Non-rigid Registration of De-noised Ultrasound Breast Tumors in Image Guided Breast-Conserving Surgery." Intelligent Systems and Human Machine Collaboration. Springer, Singapore, pp.191-206, 2023.
[27] Saha, Sanjib, et al. "ADU-Net: An Attention Dense U-Net based deep supervised DNN for automated lesion segmentation of COVID-19 from chest CT images." Biomedical Signal Processing and Control 85: 104974, 2023.