Open Access   Article

PABR Algorithm for Improving The Data Archival Performance of aHDFS

M. Mounica1 , A. Ananda Rao2 , P. Radhika Raju3

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-7 , Page no. 37-42, Jul-2018


Online published on Jul 31, 2018

Copyright © M. Mounica, A. Ananda Rao, P. Radhika Raju . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library


IEEE Style Citation: M. Mounica, A. Ananda Rao, P. Radhika Raju, “PABR Algorithm for Improving The Data Archival Performance of aHDFS”, International Journal of Computer Sciences and Engineering, Vol.6, Issue.7, pp.37-42, 2018.

MLA Style Citation: M. Mounica, A. Ananda Rao, P. Radhika Raju "PABR Algorithm for Improving The Data Archival Performance of aHDFS." International Journal of Computer Sciences and Engineering 6.7 (2018): 37-42.

APA Style Citation: M. Mounica, A. Ananda Rao, P. Radhika Raju, (2018). PABR Algorithm for Improving The Data Archival Performance of aHDFS. International Journal of Computer Sciences and Engineering, 6(7), 37-42.

136 133 downloads 26 downloads


Hadoop Distributed File System (HDFS) is highly a fault-tolerant distributed file system associated with Hadoop framework. HDFS can handle a large amount of data known as big data. HDFS deals with data archival as well. Data archiving is a phenomenon which finds inactive data and moves it into a separate storage premise. Cloud-based storage facilitates it cost-effectively while Hadoop clusters provide the computational power required. However, protecting archived data is the main concern of the data owner point of view. Erasure Coding (EC) is a method which has the mechanism to regain lost data as well. Of late aHDFS was developed to have special data archival features with EC. The problem with it is that it needs similar the computational cost for data of different sizes. Towards this end, we proposed a methodology to overcome this problem. A model application has built to exhibit evidence of the idea. The empirical results revealed that the methodology presented improves the computational efficiency in rendering data archival services.

Key-Words / Index Term

Hadoop, HDFS, Data archival system, Erasure codes


[1] D. Borthakur, “The hadoop distributed file system: Architecture and design, 2007,” Apache Software Foundation, 2012
[2] S. S. Miller, M. S. Shaalan, and L. E. Ross, “Correspondent-centric management email system uses message-correspondent relationship data table for automatically linking a single stored message with its correspondents,” Sep. 2 2003, uS Patent 6,615,241
[3] N. Madhusudhana Reddy, Dr. C. Nagaraju, Dr. A. AnandaRao, “Toward Secure Computations in Distributed Programming Frameworks: Finding Rogue nodes through Hadoop logs”, JATIT (Journal of Theoritical and Applied Information Technology),ISSN No: 1992-8645, Vol95, No 23, December 2017, Page Nos: 6398-6409
[4] O. Khan, R. C. Burns, J. S. Plank, W. Pierce, and C. Huang, “Rethinking erasure codes for cloud file systems: minimizing i/o for recovery and degraded reads.” in FAST, 2012, p. 20
[5] R. T. Kaushik and K. Nahrstedt, “T: a data-centric cooling energy costs reduction approach for big data analytics cloud,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 2012, p. 52
[6] R. Gupta, H. Gupta, U. Nambiar, and M. Mohania, “Efficiently querying archived data using hadoop,” in Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 2010, pp. 1301–1304.
[7] L. L. You, K. T. Pollack, and D. D. Long, “Deep store: An archival storage system architecture,” in Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on. IEEE, 2005, pp. 804–815.
[8] T. J. Schwarz, Q. Xin, E. L. Miller, D. D. Long, A. Hospodor, and S. Ng, “Disk scrubbing in large archival storage systems,” in Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004.(MASCOTS 2004). Proceedings. The IEEE Computer Society’s 12th Annual International Symposium on. IEEE, 2004, pp. 409–418
[9] Z. Ren, J. Wan, W. Shi, X. Xu, and M. Zhou, “Workload analysis, implications, and optimization on a production hadoop cluster: A case study on taobao,” Services Computing, IEEE Transactions on, vol. 7, no. 2, pp. 307–321, 2014
[10] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” Journal of the society for industrial and applied mathematics, vol. 8, no. 2, pp. 300–304, 1960.
[11] M. Ovsiannikov, S. Rus, D. Reeves, P. Sutter, S. Rao, and J. Kelly, “The quantcast file system,” Proceedings of the VLDB Endowment, vol. 6, no. 11, pp. 1092–1101, 2013.
[12] J. Huang, Y. Wang, X. Qin, X. Liang, S. Yin, and C. Xie, “Exploiting pipelined encoding process to boost erasure-coded data archival,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 11, pp. 2984–2996, 2015
[13] S. Quinlan and S. Dorward, “Venti: A new approach to archival storage.” in FAST, vol. 2, 2002, pp. 89–101.
[14] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, “Xoring elephants: Novel erasure codes for big data,” vol. 6, no. 5, pp. 325–336, 2013.
[15] J. Wang, P. Shang, and J. Yin, “Draw: A new data-grouping-aware data placement scheme for data intensive applications with interest locality,” in Cloud Computing for Data-Intensive Applications. Springer, 2014, pp. 149–174
[16] L. Pamies-Juarez, F. Oggier, and A. Datta, “Decentralized erasure coding for efficient data archival in distributed storage systems,” in Distributed Computing and Networking. Springer, 2013, pp. 42–56.
[17] J. C. Chan, Q. Ding, P. P. Lee, and H. H. Chan, “Parity logging with reserved space: Towards efficient updates and recovery in erasurecoded clustered storage,” in Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14), 2014, pp. 163–176.
[18] Sunita Choudhary; Preeti Narooka “Hugepage & Swappiness functions for optimization of the search graph algorithm using Hadoop framework”: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).
[19] Youngho Song; Young-Sung Shin; Miyoung Jang; Jae-Woo Chang “Design and implementation of HDFS data encryption scheme using ARIA algorithm on Hadoop” : 2017 IEEE International Conference on Big Data and Smart Computing (BigComp)
[20] Yasser Altowim; SharadMehrotra “Parallel Progressive Approach to Entity Resolution Using MapReduce”: 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[21] Yuanqi Chen, Yi Zhou, Shubbhi Taneja, Xiao Qin, Senior Member, IEEE, Jianzhong Huang, “aHDFS: An Erasure-Coded Data Archival Systemfor Hadoop Clusters”, IEEE Transactions on Parallel and Distributed Systems, 2017