Open Access   Article Go Back

An Empirical Comparison and Effect of Clustering Massive Data on Association Rule Mining

Sanjib Saha1

Section:Research Paper, Product Type: Journal Paper
Volume-11 , Issue-01 , Page no. 141-148, Nov-2023

Online published on Nov 30, 2023

Copyright © Sanjib Saha . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Sanjib Saha, “An Empirical Comparison and Effect of Clustering Massive Data on Association Rule Mining,” International Journal of Computer Sciences and Engineering, Vol.11, Issue.01, pp.141-148, 2023.

MLA Style Citation: Sanjib Saha "An Empirical Comparison and Effect of Clustering Massive Data on Association Rule Mining." International Journal of Computer Sciences and Engineering 11.01 (2023): 141-148.

APA Style Citation: Sanjib Saha, (2023). An Empirical Comparison and Effect of Clustering Massive Data on Association Rule Mining. International Journal of Computer Sciences and Engineering, 11(01), 141-148.

BibTex Style Citation:
@article{Saha_2023,
author = {Sanjib Saha},
title = {An Empirical Comparison and Effect of Clustering Massive Data on Association Rule Mining},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2023},
volume = {11},
Issue = {01},
month = {11},
year = {2023},
issn = {2347-2693},
pages = {141-148},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=1425},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=1425
TI - An Empirical Comparison and Effect of Clustering Massive Data on Association Rule Mining
T2 - International Journal of Computer Sciences and Engineering
AU - Sanjib Saha
PY - 2023
DA - 2023/11/30
PB - IJCSE, Indore, INDIA
SP - 141-148
IS - 01
VL - 11
SN - 2347-2693
ER -

           

Abstract

This paper explores the different techniques of association rule mining (ARM) and clustering in unsupervised learning and data mining. As many works have already been done on the Apriori algorithm of ARM, but there was very limited work on the other algorithms such as Predictive Apriori, Tertius and Filtered Associator. The main problem of ARM is handling a large dataset and then scanning it repeatedly. A pre-clustering effort would reduce the dataset size for each such scan for each such cluster and thus would offer overall less time requirement. The different algorithms of ARM are executed on two different datasets such as Breast Cancer and Zoo. There is a scope for improvement in performance by applying filters and clustering techniques on ARM. The best model has been proposed as follows: (i) Use data source; (ii) Apply filters (numeric to nominal and replace missing value); (iii) Apply additional filters (attribute selection or merge two values or remove folds) or evaluation method (training set maker); (iv) Apply clustering methods (K-Means, Farthest Fast, Expectation Maximization, Hierarchical and Make Density Based); (v) Apply ARM methods (Apriori, Predictive Apriori, Tertius and Filtered Associator); (vi) View result. The different ARM algorithms are evaluated with certain metrics and compared against each other based on accuracy, lift value and execution time. However, the best rules found from each ARM algorithm are different. The paper discusses the effect of clustering on ARM and claims that clustering the data before applying ARM is better.

Key-Words / Index Term

Unsupervised Learning, Data Mining, Association Rule Mining, Apriori, Clustering, K-Means

References

[1] Agarwal, Rakesh, and Ramakrishnan Srikant. "Fast algorithms for mining association rules." Proc. of the 20th VLDB Conference. Vol.487, 1994.
[2] Agrawal, Rakesh, Tomasz Imielinski, and Arun Swami. "Database mining: A performance perspective." IEEE transactions on knowledge and data engineering 5.6: pp.914-925, 1993.
[3] Agrawal, Rakesh, Tomasz Imieli?ski, and Arun Swami. "Mining association rules between sets of items in large databases." Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993.
[4] Kaushik, Minakshi, et al. "A systematic assessment of numerical association rule mining methods." SN Computer Science 2.5: 348, 2021.
[5] Ünvan, Yüksel Akay. "Market basket analysis with association rules." Communications in Statistics-Theory and Methods 50.7: pp.1615-1628, 2021.
[6] Kaushik, Minakshi, et al. "On the potential of numerical association rule mining." Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications: 7th International Conference, FDSE 2020, Quy Nhon, Vietnam, Proceedings 7. Springer Singapore, 2020, November pp.25–27, 2020.
[7] Liu, Bing, Yiming Ma, and Ching Kian Wong. "Improving an association rule based classifier." Principles of Data Mining and Knowledge Discovery: 4th European Conference, PKDD 2000 Lyon, France, Proceedings 4. Springer Berlin Heidelberg, 2000, September pp.13–16, 2000.
[8] Altaf, Wasif, Muhammad Shahbaz, and Aziz Guergachi. "Applications of association rule mining in health informatics: a survey." Artificial Intelligence Review 47: pp.313-340, 2017.
[9] Kaur, Manpreet, and Shivani Kang. "Market Basket Analysis: Identify the changing trends of market data using association rule mining." Procedia computer science 85: pp.78-85, 2016.
[10] Feng, Feng, et al. "Soft set based association rule mining." Knowledge-Based Systems 111: pp.268-282, 2016.
[11] Chiclana, Francisco, et al. "ARM–AMO: An efficient association rule mining algorithm based on animal migration optimization." Knowledge-Based Systems 154: pp.68-80, 2018.
[12] Ganda, Ritu. "Knowledge discovery from database using an integration of clustering and association rule mining." International Journal of Advanced Research in Computer Science and Software Engineering 3.9: pp.13-18, 2013.
[13] Shweta, Ms, and Dr Kanwal Garg. "Mining efficient association rules through apriori algorithm using attributes and comparative analysis of various association rule algorithms." International Journal of Advanced Research in Computer Science and Software Engineering 3.6: pp.306-312, 2013.
[14] Tan, Steinbach, and Kumar, "Cluster Analysis: Basic Concepts and Algorithms," Introduction to Data Mining, 2006, Addison-Wesley.
[15] Y?lmaz, Nergis, and Gülfem I??klar Alptekin. "The Effect of Clustering in the Apriori Data Mining Algorithm: A Case Study." Proceedings of the World Congress on Engineering. Vol.3. 2013.
[16] Scheffer, Tobias. "Finding association rules that trade support optimally against confidence." Intelligent Data Analysis 9.4: pp.381-395, 2005.
[17] Aher, Sunita B., and L. M. R. J. Lobo. "A comparative study of association rule algorithms for course recommender system in e-learning." International Journal of Computer Applications 39.1: pp.48-52, 2012.
[18] Flach, Peter A., and Nicolas Lachiche. "Confirmation-guided discovery of first-order rules with Tertius." Machine learning 42.1-2 (2001): 61.
[19] Bathla, Himani, and K. Kathuria. "Apriori algorithm and filtered associator in association rule mining." International Journal of Computer Science and Mobile Computing 4.6 (2015): 299-306.
[20] MacQueen, J. "Classification and analysis of multivariate observations." 5th Berkeley Symp. Math. Statist. Probability. Los Angeles LA USA: University of California, 1967.
[21] Murtagh, Fionn, and Pedro Contreras. "Algorithms for hierarchical clustering: an overview." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2.1 (2012): 86-97.
[22] Kriegel, Hans?Peter, et al. "Density?based clustering." Wiley interdisciplinary reviews: data mining and knowledge discovery 1.3 (2011): 231-240.
[23] WEKA3 tool for machine learning and knowledge analysis. Online available at http://www.cs.waikato.ac.nz/~ml/weka/
[24] Blake, C. and Merz, C. J. "UCI repository of machine learning datasets." University of California, Irvine, Dept. of Information and Computer Sciences.(http://www.cs.waikato.ac.nz/~ml/weka/)
[25] Asadi, Sh, Seyed Jafari, and Z. Shokrollahi. "Developing a course recommender by combining clustering and fuzzy association rules." Journal of AI and Data mining 7.2: pp.249-262, 2019.
[26] Datta, R. P., and Sanjib Saha. "Applying rule-based classification techniques to medical databases: an empirical study." International Journal of Business Intelligence and Systems Engineering 1.1: pp.32-48, 2016.
[27] Saha, Sanjib, and Debashis Nandi. "Data Classification based on Decision Tree, Rule Generation, Bayes and Statistical Methods: An Empirical Comparison." Int. J. Comput. Appl 129.7: pp.36-41, 2015.
[28] Das, Subhankar, and Sanjib Saha. "Data mining and soft computing using support vector machine: A survey." International Journal of Computer Applications 77.14, 2013.
[29] Saha, Sanjib. "Non-rigid Registration of De-noised Ultrasound Breast Tumors in Image Guided Breast-Conserving Surgery." Intelligent Systems and Human Machine Collaboration. Springer, Singapore, pp.191-206, 2023.
[30] Saha, Sanjib, et al. "ADU-Net: An Attention Dense U-Net based deep supervised DNN for automated lesion segmentation of COVID-19 from chest CT images." Biomedical Signal Processing and Control 85: 104974, 2023.