Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm

M.V.Jagannatha Reddy,  B.Kavitha

Open Access Article Go Back

Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm

M.V.Jagannatha Reddy¹ , B.Kavitha ²

Section:Research Paper, Product Type: Journal Paper
Volume-4 , Issue-5 , Page no. 36-41, May-2016

Online published on May 31, 2016

Copyright © M.V.Jagannatha Reddy, B.Kavitha . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: M.V.Jagannatha Reddy, B.Kavitha, “Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm,” International Journal of Computer Sciences and Engineering, Vol.4, Issue.5, pp.36-41, 2016.

MLA Style Citation: M.V.Jagannatha Reddy, B.Kavitha "Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm." International Journal of Computer Sciences and Engineering 4.5 (2016): 36-41.

APA Style Citation: M.V.Jagannatha Reddy, B.Kavitha, (2016). Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm. International Journal of Computer Sciences and Engineering, 4(5), 36-41.

BibTex Style Citation:
@article{Reddy_2016,
author = {M.V.Jagannatha Reddy, B.Kavitha},
title = {Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2016},
volume = {4},
Issue = {5},
month = {5},
year = {2016},
issn = {2347-2693},
pages = {36-41},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=900},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=900
TI - Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm
T2 - International Journal of Computer Sciences and Engineering
AU - M.V.Jagannatha Reddy, B.Kavitha
PY - 2016
DA - 2016/05/31
PB - IJCSE, Indore, INDIA
SP - 36-41
IS - 5
VL - 4
SN - 2347-2693
ER -

VIEWS	PDF	XML
1713	1552 downloads	1513 downloads

Bar Line

Abstract

Clustering incomplete mixed numerical and categorical datasets is one of the challenging task. Traditional algorithms like k-prototype algorithm is used for mixed dataset, but is limited to only complete datasets. To handle such incomplete datasets we use modified squeezer algorithm, which includes the new dissimilarity measure for incomplete dataset with mixed numerical and categorical attribute values. In this modified squeezer algorithm it not only cluster the incomplete dataset, it also need not to input the missing values and need not to initialize any clusters at the beginning. This algorithm is compared with traditional k-prototype algorithm on benchmark datasets. The experimental results shows that the modified squeezer algorithm gives better accuracy than the traditional algorithm and also it overcomes the limitation of initial clusters.

Key-Words / Index Term

mixed dataset, k-prototype, modified squeezer algorithm, dissimilarity measure

References

[1] M.V.Jagannatha Reddy and Dr. B. Kavitha, “clustering mixed numerical and categorical dataset using similarity weight and filter method”, International journal of Database Theory and Applications, vol-5, no-1 March- (2012), pp-121-134
[2] H. Zhexue, “Extension to the K-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, (1998), pp. 283-304.
[3] T. Covões and E. Hruschka, “A study of K-Means-based algorithms for constrained clustering”, Intelligent Data Analysis, vol. 17, no. 3, (2013), pp. 485-505.
[4] H. Zhexue, “Clustering large data sets with mixed numeric and categorical values”, Proceedings of the 1th pacific-Asia Conference on Knowledge Discovery & Data Mining. Singapore: World Scientific, (1997), pp. 21-34.
[5] W. Qian, W. Cheng and F. Zhenyuan, “Summary of k-means clustering algorithm”, Electronic Design Engineering, vol. 20, no. 7, (2012), pp. 21-24.
[6] C. Dan and W. Zhenhua, “A K-prototypes Algorithm Based on Improved Initial Center Points”, Computer Knowledge and Technology, (2010) November.
[7] C. Sotirios, “A fuzzy c-means-type algorithm for clustering of deal with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional”, Expert Systems with Applications, vol. 38, no. 7, (2011), pp. 8684-8689.
[8] W. Fengmei and H. Lixia, “A Missing Data Imputation Method Based on Neighbor Rules”, Computer Engineering, vol. 38, no. 21, (2012).
[9] X. Fang and Z. Guizhu, “Clustering algorithm based on Modified Shuffled Frog Leaping Algorithm and K-means”, Computer Engineering and Applications, vol. 49, no. 1, (2013), pp. 176-180.
[10] Takashi Furukawa, Shin-ichi Ohnishi, and Takahiro Yamanoi “On a Fuzzy c-means Algorithm for Mixed Incomplete Data Using Partial Distance and Imputation” Proceedings of the International MultiConference of Engineers and Computer Scientists 2014 Vol I, IMECS 2014, March 12 - 14, 2014, Hong Kong.
[11] Vaishali H. Umathe, Prof. Gauri Chaudhary. “A Review on Incomplete Data And Clustering” (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) , 2015, pp 1225-1227
[12] J. Twisk, M. de Boer, W. de Vente and M. Heymans, “Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis”, Journal of Clinical Epidemiology, vol. 66, no. 9, (2013), pp. 1022-1028.
[13] Wu Sen, Chen Hong and Feng Xiaodong “Clustering algorithm for incomplete data sets with mixed numeric and categorical Attributes” IJDTA, vol. 6 No. 5 2013, pp 95-104.
[14] W. Guoyin, “Expansion in the theory of rough set in incomplete information system”, Journal of computer research and development, vol. 33, no. 10, (2002), pp. 1239-1240.

Citations	2325
h-index	16
i10-index	47