Open Access   Article Go Back

Implementation of K-Means Clustering in Big Data Environment

Ayush Gupta1 , Pratik Gite2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-11 , Page no. 38-44, Nov-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i11.3844

Online published on Nov 30, 2019

Copyright © Ayush Gupta, Pratik Gite . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Ayush Gupta, Pratik Gite, “Implementation of K-Means Clustering in Big Data Environment,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.11, pp.38-44, 2019.

MLA Style Citation: Ayush Gupta, Pratik Gite "Implementation of K-Means Clustering in Big Data Environment." International Journal of Computer Sciences and Engineering 7.11 (2019): 38-44.

APA Style Citation: Ayush Gupta, Pratik Gite, (2019). Implementation of K-Means Clustering in Big Data Environment. International Journal of Computer Sciences and Engineering, 7(11), 38-44.

BibTex Style Citation:
@article{Gupta_2019,
author = {Ayush Gupta, Pratik Gite},
title = {Implementation of K-Means Clustering in Big Data Environment},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2019},
volume = {7},
Issue = {11},
month = {11},
year = {2019},
issn = {2347-2693},
pages = {38-44},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4941},
doi = {https://doi.org/10.26438/ijcse/v7i11.3844}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i11.3844}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4941
TI - Implementation of K-Means Clustering in Big Data Environment
T2 - International Journal of Computer Sciences and Engineering
AU - Ayush Gupta, Pratik Gite
PY - 2019
DA - 2019/11/30
PB - IJCSE, Indore, INDIA
SP - 38-44
IS - 11
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
458 415 downloads 169 downloads
  
  
           

Abstract

In recent years the digital data is grown much frequently. Handling and processing of such bulky data are much complex and need the attention of a human. Moreover, the existing techniques and methods are not much suitable to deal with this complex nature of computation. To deal with such a complex nature of computation, the big data analytics played an essential role. In this presented work the unsupervised learning technique namely k-means clustering is implemented initially and their performance is measured. During this to enhance the performance of the system a new modified k-means clustering algorithm is proposed by improving the centroid selection technique and using the RBF kernel. The comparative performance analysis of both the versions of k-means clustering demonstrate the modified k-means clustering is efficient and has the low algorithm run time. Therefore it is a promising approach for analytics, thus it’s a future extension that is also presented in this work.

Key-Words / Index Term

Big Data, Big Data Analytics, Unsupervised learning, Clustering Algorithm, improvements

References

[1] R. H. Hariri, E. M. Fredericks, K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges”, J Big Data (2019) 6:44, https://doi.org/10.1186/s40537-019-0206-3
[2] A. Patel, M. Jaiswal, R. K. Chawda, “An Approach to Predict Train Delay Using Big Data Analytic Approaches”, International Journal of Advanced Research in Computer and Communication Engineering, ISO 3297:2007 Certified, Vol. 7, Issue 3, March 2018
[3] Z. P. Reddy, P.N.V.S. P. Kumar, “Comparing the Word count Execution Time in Hadoop & Spark”, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 3 Issue 10, October 2016, ISSN (Online) 2348 – 7968
[4] F. C. Yayah, K. I. Ghauth, C. Y. Ting, “Adopting Big Data Analytics Strategy in Telecommunication Industry”, Journal of Computer Science & Computational Mathematics, Volume 7, Issue 3, September 2017, DOI: 10.20967/jcscm.2017.03.002
[5] C. L. P. Chen, C. Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data”, Information Sciences 275 (2014) 314–347
[6] L. Xiangi, G. Zhao, Q. Li, W. Hao, F. Li, “TUMK-ELM: A Fast Unsupervised Heterogeneous Data Learning Approach”, VOLUME 6, 2018, 2169-3536, 2018 IEEE
[7] N. Hajj, Y. Rizk, M. Awad, “A MapReduce Cortical Algorithms Implementation for Unsupervised Learning of Big Data”, Procedia Computer Science, Volume 53, 2015, Pages 327–334, 2015 INNS Conference on Big Data
[8] L. Zhou, S. Pan, J. Wang, A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges”, Neurocomputing 237 (2017) 350–361
[9] X. W. Chen, XIAOTONG LIN2, “Big Data Deep Learning: Challenges and Perspectives”, Vol. 2, 2014, 2169-3536, 2014 IEEE
[10] Y. Lei, F. Jia, J. Lin, S. Xing, S. X. Ding, “An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data”, 0278-0046 (c) 2015 IEEE.
[11] A. B. Ayed, M. B. Halima, A. M. Alimi, “Survey on clustering methods: Towards fuzzy clustering for big data”, 978-1-4799-5934-1/14/$31.00 ©2014 IEEE
[12] X. Cai, F. Nie, H. Huang, “Multi-View K-Means Clustering on Big Data”, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence,
[13] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, A. Bouras, “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis”, Vol. 2, No. 3, Sep. 2014, 2168-6750 2014 IEEE
[14] S. S. Chouhan, R. Khatri, “Data Mining based Technique for Natural Event Prediction and Disaster Management”, International Journal of Computer Applications (0975 – 8887) Volume 139 – No.14, April 2016
[15] B. Feizizadeh, M. S. Roodposhti, T. Blaschke, J. Aryal, “Comparing GIS-based support vector machine kernel functions for landslide susceptibility mapping”, Arab J Geosci (2017) 10:122, DOI 10.1007/s12517-017-2918-z