Open Access   Article Go Back

A Hybrid Data Clustering Technique in Big Data using Machine Learning

K. Sharma1 , P. Rehan2

Section:Research Paper, Product Type: Journal Paper
Volume-8 , Issue-1 , Page no. 40-47, Jan-2020

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v8i1.4047

Online published on Jan 31, 2020

Copyright © K. Sharma, P. Rehan . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: K. Sharma, P. Rehan, “A Hybrid Data Clustering Technique in Big Data using Machine Learning,” International Journal of Computer Sciences and Engineering, Vol.8, Issue.1, pp.40-47, 2020.

MLA Style Citation: K. Sharma, P. Rehan "A Hybrid Data Clustering Technique in Big Data using Machine Learning." International Journal of Computer Sciences and Engineering 8.1 (2020): 40-47.

APA Style Citation: K. Sharma, P. Rehan, (2020). A Hybrid Data Clustering Technique in Big Data using Machine Learning. International Journal of Computer Sciences and Engineering, 8(1), 40-47.

BibTex Style Citation:
@article{Sharma_2020,
author = {K. Sharma, P. Rehan},
title = {A Hybrid Data Clustering Technique in Big Data using Machine Learning},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2020},
volume = {8},
Issue = {1},
month = {1},
year = {2020},
issn = {2347-2693},
pages = {40-47},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4993},
doi = {https://doi.org/10.26438/ijcse/v8i1.4047}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v8i1.4047}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4993
TI - A Hybrid Data Clustering Technique in Big Data using Machine Learning
T2 - International Journal of Computer Sciences and Engineering
AU - K. Sharma, P. Rehan
PY - 2020
DA - 2020/01/31
PB - IJCSE, Indore, INDIA
SP - 40-47
IS - 1
VL - 8
SN - 2347-2693
ER -

VIEWS PDF XML
269 381 downloads 154 downloads
  
  
           

Abstract

Big Data refers to a huge collection of data like the Banking data, social media data, repository data etc. These types of fields are responsible for day to day relevant data retrieval and processing. Clustering is one of major tasks which are done for data in order to minimize the time delay and efficient information retrieval. In this work we worked on similarity index in the form of cosine and soft cosine to count the total connection with respect to documents in the form of data. Then we use Cosine and Soft Cosine measures as hybrid Similarity algorithm to intakes the threshold policy of K means and co relation linkage property of Linkage clustering and forms new clusters. The cross-validation of the proposed work model has been done using Support Vector Machine followed by K-Mediod to improve the accuracy of clustering. This research work also focuses on different techniques of Clustering as well as classification. This research work mainly focuses on optimizing the clustering performance of the Big Data so that wealthy information can be retrieved with least cost.

Key-Words / Index Term

Data mining, Big data, Clustering, Classification, Support Vector Machine

References

[1] Dipti Shikha Singh and Garima Singh, “Big Data: A Review”, International Research Journal of Engineering and Technology (IRJET), Vol. 04, No. 04, pp. 822-824, 2017
[2] Richa Gupta, Sunny Gupta, and Anuradha Singhal, "Big data: overview" International Journal of Computer Trends and Technology (IJCTT), Vol. 9, No. 5, pp. 266-268, 2014
[3] S. Gnanapriya, R. Suganya, G. Sumithra Devi, and M. Suresh Kumar, "Data Mining Concepts and Techniques", Data Mining and Knowledge Engineering, Vol. 2, no. 9, pp: 256-263, 2010
[4] T. Sajana, CM Sheela Rani, and K. V. Narayana, “A survey on clustering techniques for big data mining”, Indian Journal of Science and Technology, Vol. 9, no. 3, 2016.
[5] V. W. Ajin, and Lekshmy D. Kumar, "Big data and clustering algorithms", In IEEE International Conference on Research Advances in Integrated Navigation Systems (RAINS), pp. 1-5, 2016.
[6] Raj Kumar, and Rajesh Verma, "Classification algorithms for data mining - A survey", In the International Journal of the Innovations in Engineering and Technology (IJIET), vol. 1, no. 2, pp: 7-14. 2012.
[7] Ahmed Oussous, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih, “Big Data Technologies: A Survey”, Journal of King Saud University-Computer and Information Sciences, 2017.
[8] Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, and Abdelaziz Bouras, “A survey of clustering algorithms for big data: Taxonomy and empirical analysis”, IEEE transactions on emerging topics in computing, Vol. 2, no. 3, pp: 267-279, 2014
[9] G. Kesavaraj, and S. Sukumaran. "A study on classification techniques in data mining." In IEEE Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1-7. 2013.
[10] R. Tamilselvi and S. Kalaiselvi, "An Overview of Data Mining Techniques and Applications", International Journal of Science and Research (IJSR), Vol. 2, No. 2, pp. 506-509, 2013.
[11] Praful Koturwar, Sheetal Girase, and Debajyoti Mukhopadhyay, "A survey of classification techniques in the area of big data", arXiv preprint arXiv: 1503.07477, 2015.
[12] V. W. Ajin, and Lekshmy D. Kumar, "Big data and clustering algorithms", In IEEE International Conference on Research Advances in Integrated Navigation Systems (RAINS), pp. 1-5. 2016.
[13] Ahmed Oussous, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih, “Big Data Technologies: A Survey”, Journal of King Saud University-Computer and Information Sciences, 2017.