Open Access   Article

Data Transformation Technique for Preserving Privacy in Data

Uma Shankar Rao Erothi1 , Sireesha Rodda2

1 Department of CSE, RAGHU Institute of Technology, Visakhapatnam, India.
2 Department of CSE, GITAM Institute of Technology, GITAM Deemed to be University, Visakhapatnam, India.

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-5 , Page no. 42-50, May-2018


Online published on May 31, 2018

Copyright © Uma Shankar Rao Erothi, Sireesha Rodda . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library


IEEE Style Citation: Uma Shankar Rao Erothi, Sireesha Rodda, “Data Transformation Technique for Preserving Privacy in Data”, International Journal of Computer Sciences and Engineering, Vol.6, Issue.5, pp.42-50, 2018.

MLA Style Citation: Uma Shankar Rao Erothi, Sireesha Rodda "Data Transformation Technique for Preserving Privacy in Data." International Journal of Computer Sciences and Engineering 6.5 (2018): 42-50.

APA Style Citation: Uma Shankar Rao Erothi, Sireesha Rodda, (2018). Data Transformation Technique for Preserving Privacy in Data. International Journal of Computer Sciences and Engineering, 6(5), 42-50.

125 160 downloads 29 downloads


The increase of digitization has led to growing concerns over preserving privacy of sensitive data. The ubiquity of sensitive information in data sources such as financial transactions, commercial transactions, medical records, network communication etc., steered towards development of different privacy preserving techniques. In this paper, a novel data transformation technique has been proposed for providing efficient privacy preservation in the data. Inorder to provide privacy to data, the numeric attributes are transformed to the range [-1,1] while the characters or strings are transformed to binary strings. Data analysis over the transformed dataset provides the same result as that of the original dataset. The performance of the data transformation technique is evaluated on the datasets before and after transformation. Experiments on five standard datasets indicate high data utility of the proposed technique. The proposed technique is also evaluated on the standard network intrusion dataset NSL-KDD dataset to study the effectiveness of the proposed technique in intrusion detection domain and the results are analyzed. Privacy measures are evaluated to ascertain the degree of privacy offered by the proposed technique.

Key-Words / Index Term

Privacy Preservation, PPDM, Data Transformation, Network Intrusion Detection, Data Mining


[1] Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer,Muthuramakrishnan Venkitasubramaniam,
"l-diversity: Privacy beyond k-anonymity", ACM Transactions on Knowledge Discovery from Data (TKDD), Vol.1,No.1,pp.1-12,2007.
[2] Ninghui Li, Tiancheng Li and Suresh Venkatasubramanian,“t-closeness: Privacy Beyond k-anonymity and l-diversity”, IEEE 23rd International Conference on Data Engineering,IEEE, pp.1-10, 2007.
[3] A. Hussien, N. Hamza and H. Hefny, "Attacks on anonymization-based privacy-preserving: a survey for data mining and data publishing",Journal of Information Security, Vol. 4, No. 2, pp. 101-110, 2013.
[4] Yu Zhu and Lei Liu, "Optimal randomization for privacy preserving data mining", Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining,ACM, pp.761-766, 2004.
[5] Swapnil Kadam and Navnath Pokale, “Preserving Data Mining through Data Perturbation”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Vol. 4, No. 11, pp. 4128-4131,2015.
[6] Ashish. E. Mane and Sushma Gunjal, “Privacy preserving using additive perturbation based on multilevel trust in relational streaming data”, Multidisciplinary Journal of Research in Engineering and Technology(MJRET), Vol. 2, No. 2, pp. 392-397,2015.
[7] Wenliang Du and Mikhail J.Atallah, “Secure multy-party computation problems and their applications: a review and open problems”, Proceedings of the 2001 workshop on new security paradigms, ACM, pp. 13-22, 2001.
[8] Benny Pinkas,“Cryptographic techniques for privacy-preserving data mining”, ACM Sigkdd Explorations Newsletter,Vol. 4,No. 2, pp. 12-19, 2002.
[9] Syed Md. Tarique Ahmad, Shameemul Haque and Prince Shoeb Khan, ”Privacy Preserving in Data Mining by Normalization”, International Journal of Computer Applications, Vol. 96, No. 4, pp. 14-18, 2014.
[10] C.Saranya and G.Manikandan. ”A Study on normalization techniques for privacy preserving data mining”, International Journal of Engineering and Technology (IJET), Vol. 5, No.3, pp. 2701-2704, 2013.
[11] Yogendra Kumarjain and Santoshkumar Bhandare,” Min max normalization based data perturbation method for privacy protection”, International Journal of Computer & Communication Technology (IJCCT), Vol. 2, No. 8, pp. 45-50, 2011.
[12] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten ,“The WEKA Data Mining Software: An Update”, SIGKDD Explorations, Vol. 11, No. 1, 2009.
[13] Vatsalan, Dinusha, Peter Christen and Erhard Rahm, "Scalable Multi-Database Privacy-Preserving Record Linkage using Counting Bloom Filters", arXiv preprint arXiv:1701.01232, 2017.
[14] Hillol Kargupta,Souptik Datta,Qi Wang and KrishnaMoorthy, ”Random-data perturbation technique and privacy-preserving data mining”, IEEE International Conference on Data Mining,IEEE, pp. 1-19, 2003.
[15] K.Muralidhar and R.Sarathy, “Perturbation methods for protecting numerical data: Evolution and evaluation”, Proceedings of the 5th Security Conference, 2006.
[16] Pirangela Samarati and Latanya Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression”, Technical report, SRI International, pp. 1-19, 1998.
[17] Keke Chen and Ling Liu, “Privacy preserving data classification with rotation perturbation”, In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05), IEEE, pp. 589–592, 2005.
[18] Zhengli Huang, Wenliang Du and Biao Chen.” Deriving private information from randomized data”, In Proc. of ACM SIGMOD’05, pp. 37-48, 2005.
[19] Rakesh Agrawal and RamaKrishnan Srikant, ”privacy preserving data mining”, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Vol. 29, No. 2, pp. 439-450, 2000.
[20] Li Liu, Murat Kantarcioglu and Bhavani Thuraisingham, “The applicability of the perturbation model-based privacy preserving data mining for real-world data”, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW`06), pp. 6-21, 2006.
[22] Huy Anh Nguyen and Deokjai Choi, “Application of data mining to network intrusion detection: classifier selection model”, Asia-Pacific Network Operations and Management SymposiumSpringer Berlin Heidelberg, pp. 399-408, 2008.
[23] Phurivit Sangkatsanee, Naruemon Wattanapongsakorn and Chalermpol Charnsripinyo, “Real-time Intrusion Detection and Classification”, IEEE network, 2009.
[24] KDDcup99, “Knowledge discovery in databases DARPAarchive”, task.html, 1999.
[25] Blake, Catherine, and Christopher J. Merz, "{UCI} Repository of machine learning databases", 1998.
[26] Shuting Xu,Jun Zhang,Dianwei Han and Jie Wang, "Data distortion for privacy protection in a terrorist analysis system", International Conference on Intelligence and Security Informatics, Springer Berlin Heidelberg, pp.459-464, 2005.
[27] Wang, Jie, Weijun Zhong, and Jun Zhang, "NNMF-based factorization techniques for high-accuracy privacy protection on non-negative-valued datasets", Sixth IEEE International Conference on Data Mining-Workshops (ICDMW`06), IEEE, 2006.
[28] Jie Wang, Weijun Zhong,Shuting Xu and Jun Zhang, "Selective Data Distortion via Structural Partition and SSVD for Privacy Preservation", IKE, pp.1-7, 2006.