Implementation of K-Means Clustering in Big Data Environment

Ayush Gupta, Pratik Gite

Open Access Article Go Back

Implementation of K-Means Clustering in Big Data Environment

Ayush Gupta¹ , Pratik Gite²

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-11 , Page no. 38-44, Nov-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i11.3844

Online published on Nov 30, 2019

Copyright © Ayush Gupta, Pratik Gite . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Ayush Gupta, Pratik Gite, “Implementation of K-Means Clustering in Big Data Environment,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.11, pp.38-44, 2019.

MLA Style Citation: Ayush Gupta, Pratik Gite "Implementation of K-Means Clustering in Big Data Environment." International Journal of Computer Sciences and Engineering 7.11 (2019): 38-44.

APA Style Citation: Ayush Gupta, Pratik Gite, (2019). Implementation of K-Means Clustering in Big Data Environment. International Journal of Computer Sciences and Engineering, 7(11), 38-44.

BibTex Style Citation:
@article{Gupta_2019,
author = {Ayush Gupta, Pratik Gite},
title = {Implementation of K-Means Clustering in Big Data Environment},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2019},
volume = {7},
Issue = {11},
month = {11},
year = {2019},
issn = {2347-2693},
pages = {38-44},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4941},
doi = {https://doi.org/10.26438/ijcse/v7i11.3844}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i11.3844}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4941
TI - Implementation of K-Means Clustering in Big Data Environment
T2 - International Journal of Computer Sciences and Engineering
AU - Ayush Gupta, Pratik Gite
PY - 2019
DA - 2019/11/30
PB - IJCSE, Indore, INDIA
SP - 38-44
IS - 11
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
458	415 downloads	169 downloads

Bar Line

Abstract

In recent years the digital data is grown much frequently. Handling and processing of such bulky data are much complex and need the attention of a human. Moreover, the existing techniques and methods are not much suitable to deal with this complex nature of computation. To deal with such a complex nature of computation, the big data analytics played an essential role. In this presented work the unsupervised learning technique namely k-means clustering is implemented initially and their performance is measured. During this to enhance the performance of the system a new modified k-means clustering algorithm is proposed by improving the centroid selection technique and using the RBF kernel. The comparative performance analysis of both the versions of k-means clustering demonstrate the modified k-means clustering is efficient and has the low algorithm run time. Therefore it is a promising approach for analytics, thus it’s a future extension that is also presented in this work.

Key-Words / Index Term

Big Data, Big Data Analytics, Unsupervised learning, Clustering Algorithm, improvements

References

[1] R. H. Hariri, E. M. Fredericks, K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges”, J Big Data (2019) 6:44, https://doi.org/10.1186/s40537-019-0206-3
[2] A. Patel, M. Jaiswal, R. K. Chawda, “An Approach to Predict Train Delay Using Big Data Analytic Approaches”, International Journal of Advanced Research in Computer and Communication Engineering, ISO 3297:2007 Certified, Vol. 7, Issue 3, March 2018
[3] Z. P. Reddy, P.N.V.S. P. Kumar, “Comparing the Word count Execution Time in Hadoop & Spark”, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 3 Issue 10, October 2016, ISSN (Online) 2348 – 7968
[4] F. C. Yayah, K. I. Ghauth, C. Y. Ting, “Adopting Big Data Analytics Strategy in Telecommunication Industry”, Journal of Computer Science & Computational Mathematics, Volume 7, Issue 3, September 2017, DOI: 10.20967/jcscm.2017.03.002
[5] C. L. P. Chen, C. Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data”, Information Sciences 275 (2014) 314–347
[6] L. Xiangi, G. Zhao, Q. Li, W. Hao, F. Li, “TUMK-ELM: A Fast Unsupervised Heterogeneous Data Learning Approach”, VOLUME 6, 2018, 2169-3536, 2018 IEEE
[7] N. Hajj, Y. Rizk, M. Awad, “A MapReduce Cortical Algorithms Implementation for Unsupervised Learning of Big Data”, Procedia Computer Science, Volume 53, 2015, Pages 327–334, 2015 INNS Conference on Big Data
[8] L. Zhou, S. Pan, J. Wang, A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges”, Neurocomputing 237 (2017) 350–361
[9] X. W. Chen, XIAOTONG LIN2, “Big Data Deep Learning: Challenges and Perspectives”, Vol. 2, 2014, 2169-3536, 2014 IEEE
[10] Y. Lei, F. Jia, J. Lin, S. Xing, S. X. Ding, “An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data”, 0278-0046 (c) 2015 IEEE.
[11] A. B. Ayed, M. B. Halima, A. M. Alimi, “Survey on clustering methods: Towards fuzzy clustering for big data”, 978-1-4799-5934-1/14/$31.00 ©2014 IEEE
[12] X. Cai, F. Nie, H. Huang, “Multi-View K-Means Clustering on Big Data”, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence,
[13] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, A. Bouras, “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis”, Vol. 2, No. 3, Sep. 2014, 2168-6750 2014 IEEE
[14] S. S. Chouhan, R. Khatri, “Data Mining based Technique for Natural Event Prediction and Disaster Management”, International Journal of Computer Applications (0975 – 8887) Volume 139 – No.14, April 2016
[15] B. Feizizadeh, M. S. Roodposhti, T. Blaschke, J. Aryal, “Comparing GIS-based support vector machine kernel functions for landslide susceptibility mapping”, Arab J Geosci (2017) 10:122, DOI 10.1007/s12517-017-2918-z

Citations	2325
h-index	16
i10-index	47