Open Access   Article Go Back

Architecture for Automated Data Quality Checking in Big Data Migration Process

V. Rathika1

Section:Research Paper, Product Type: Journal Paper
Volume-07 , Issue-04 , Page no. 36-39, Feb-2019

Online published on Feb 28, 2019

Copyright © V. Rathika . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: V. Rathika, “Architecture for Automated Data Quality Checking in Big Data Migration Process,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.36-39, 2019.

MLA Style Citation: V. Rathika "Architecture for Automated Data Quality Checking in Big Data Migration Process." International Journal of Computer Sciences and Engineering 07.04 (2019): 36-39.

APA Style Citation: V. Rathika, (2019). Architecture for Automated Data Quality Checking in Big Data Migration Process. International Journal of Computer Sciences and Engineering, 07(04), 36-39.

BibTex Style Citation:
@article{Rathika_2019,
author = {V. Rathika},
title = {Architecture for Automated Data Quality Checking in Big Data Migration Process},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {2 2019},
volume = {07},
Issue = {04},
month = {2},
year = {2019},
issn = {2347-2693},
pages = {36-39},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=717},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=717
TI - Architecture for Automated Data Quality Checking in Big Data Migration Process
T2 - International Journal of Computer Sciences and Engineering
AU - V. Rathika
PY - 2019
DA - 2019/02/28
PB - IJCSE, Indore, INDIA
SP - 36-39
IS - 04
VL - 07
SN - 2347-2693
ER -

           

Abstract

Data are gathered from different sources that have high quality issues. Increasing volume of information is there in the digital libraries. Most of the system may be affected by the replicas. Data cleaning is the important process to remove replicas using de-duplication. It consists of process of parsing, data transformation, duplicate elimination and statistical methods. It is one of the most challenging stages to clear repeated documents. It deals with the detection and removal of errors, filling in omitted values, smoothing noisy data to improve the quality of data. De-duplication is the key function in data integration which is from various sources. It is the process of determining all categories of information contained by a data set that indicate the same real world entity. This paper is going to introduce a methodology for automated data quality checking with de-duplication algorithm.

Key-Words / Index Term

Data Quality, Data Cleansing, De-Duplication

References

[1] Lalitha.L, Maheswari.B, Dr.Karthik.S, “A Detailed Survey on Various Record Deduplication Methods”, International Journal of Advanced Research in Computer Engineering and Technology, Volume 1, No.8, October 2012, ISSN: 2278-1323.
[2] VarshaWandhekar, ArtiMohanpurkar, “Validation Of Deduplication In Data Using Similarity Measure”, International Journal of Computer Applications, Volume 116, No.21, April 2015, ISSN: 0975-8887.
[3] A.F.Elgamal, N.A.Mosa, N.A.Amasha, “Application Of Framework For Data Cleaning To Handle Noisy Data In Data Warehouse”, International Journal of Soft Computing and Engineering, Volume 3, No.6, January 2014, ISSN: 2231-2307.
[4] Bilal Khan, AzharRauf, HumaJaved, Shah Khusro, “Removing Fully And Partially Duplicated Records Through K-Means Clustering”, International Journal of Engineering and Technology, Volume 4, No.6, December 2012.
[5] J.R.Waykole, S.M.Shinde, “A Survey Paper On Deduplication By Using Genetic Algorithm Alongwith Hash Based Algorithm”, International Journal of Engineering Research and Applications, Volume 4, Issue 1, January, 2014, ISSN: 2248 -9622.
[6] Rohitananthakrishna, SurajChaudhari, VenkateshGanthi, “Eliminating Fuzzy Duplicates In Data Warehouses”, Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
[7] Thilagavathi.S, “Record Linkage And Deduplication Using FEBRL Frameworl And Block, Sorting, Bigram Indexing Techniques”, International Journal of Innovative Trends and Emerging Technologies”, Volume 1, No.1, March 2014, ISSN: 2349-9842.
[8] BassmaS.Alsulami, MaysoonF.Abulkhir, FathyE.Eassa, “Near Duplicate Document Detection Survey”, International Journal of Computer Science and Communication Networks, Volume 2(2), 2012, 147-151, ISSN: 2249-5789.
[9] Nishand.K, Ramasami.S, T.Rajendran, “An Efficient Way Of Record Linkage System And Deduplication Using Indexing Techniques, Classification And FEBRL Framework”, International journal of Emerging Science and Engineering, Volume 01, Issue 07, May-2013, ISSN: 2319-6378.
[10] PrernaS.Kulkarni, Dr.J.W.Bakal, “Survey On Data Cleaning”, International Journal of Engineering Science and Innovative Technology”, Volume 3, Issue 4, No. 2, July -2014, ISSN: 2319 – 5967.
[11] Sapna Devi, Dr.ArvindKalia, “Study Of Data Cleaning & Comparison Of Data Cleaning Tools”, International Journal of Computer Science and Mobile Computing, Volume 4(3), pp. 360–370, March 2015.
[12] RajashreeY.Patil, Dr.R.V.Kulkarni, “A Review Of Data Cleaning Algorithms For Data Warehouse Systems”, International Journal of Computer Science and Information Technologies, Volume 3, Number 5, 2012. ISSN: 5212 -5214.
[13] seetalamDivyaManusha, ValivetiKarthik, PrathipatiRatna Kumar, “De-Duplication Of Citation Data By Genetic Programming Approach”, International journal of Recent Advances in Engineering & Technology, Volume 1, Issue 3, 2013, eISSN:2374-2812.
[14] L.Chitra Devi, S.M.Hansa, Dr.G.N.SureshBabu, “A Genetic Programming Approach For Record Deduplication”, International Journal of Innovative Research in Computer and Communication Engineering, Volume 1, No.4, June 2013, ISSN: 2320-9798.
[15] Y.SyedMudhasir, J.Deepika, S.Senthilkumar, and G.S.Mahalakshmi, “Near Duplicates Detection And Elimination Based On Web Provenance For Effective Web Search”, International Journal on Internet and Distributed Computing Systems, Volume 1, No.1, August 2011.
[16] SupriyaAllampallewar, J.Ratnaraja Kumar, “A Survey Study ForDeduplication In Large Scale Data”, International Journal of Advanced Research in Computer and Communication Engineering, Volume 5, No.2, February 2016.
[17] AnestisSitas, SarantosKapidakis, “Duplicate detection algorithms of bibliographic descriptions”, Library Hi Tech., Volume 26, No.2, 2008, ISSN:0737-8831.
[18] S.B.Kadus, H.A.Sawant, S.S.Tilekar and H.D.Zendage, “Data deduplication of election database using windowing algorithm”, International Journal of Current Research in Science and Technology, Volume 1, No.4, 2015.