Open Access   Article Go Back

An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework

Guru Prasad M.S.1 , Nagesh H.R.2 , Swathi Prabhu3

  1. Dept. of Computer Science and Engineering, SDMIT -VTU-Belagavi, Ujire, India.
  2. Dept. of Computer Science and Engineering, MITE- VTU-Belagavi, Moodbidri, India.
  3. Dept. of Computer Science and Engineering, SDMIT -VTU-Belagavi, Ujire, India.

Correspondence should be addressed to: guru0927@gmail.com.

Section:Research Paper, Product Type: Journal Paper
Volume-5 , Issue-6 , Page no. 112-120, Jun-2017

Online published on Jun 30, 2017

Copyright © Guru Prasad M.S., Nagesh H.R., Swathi Prabhu . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Guru Prasad M.S., Nagesh H.R., Swathi Prabhu, “An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.6, pp.112-120, 2017.

MLA Style Citation: Guru Prasad M.S., Nagesh H.R., Swathi Prabhu "An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework." International Journal of Computer Sciences and Engineering 5.6 (2017): 112-120.

APA Style Citation: Guru Prasad M.S., Nagesh H.R., Swathi Prabhu, (2017). An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework. International Journal of Computer Sciences and Engineering, 5(6), 112-120.

BibTex Style Citation:
@article{M.S._2017,
author = {Guru Prasad M.S., Nagesh H.R., Swathi Prabhu},
title = {An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2017},
volume = {5},
Issue = {6},
month = {6},
year = {2017},
issn = {2347-2693},
pages = {112-120},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1311},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1311
TI - An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework
T2 - International Journal of Computer Sciences and Engineering
AU - Guru Prasad M.S., Nagesh H.R., Swathi Prabhu
PY - 2017
DA - 2017/06/30
PB - IJCSE, Indore, INDIA
SP - 112-120
IS - 6
VL - 5
SN - 2347-2693
ER -

VIEWS PDF XML
1323 940 downloads 637 downloads
  
  
           

Abstract

The most popular open source distributed computing framework called Hadoop was designed by Doug Cutting and his team, which involves thousands of nodes to process and analyze huge amounts of data called Big Data. The major core components of Hadoop are HDFS (Hadoop Distributed File System) and MapReduce. This framework is the most popular and powerful for store, manage and process Big Data applications. But drawback with this tool related to stability and performance issues for small file applications in storage, manage and processing the data. Existing approaches deals with small files problem are Hadoop archives and SequenceFile. However, existing approaches doesn’t give an optimized performance to solve small files problems on Hadoop. In order to improve the performance in storing, managing and processing small files on Hadoop, we proposed an approach for Hadoop MapReduce framework to handle the small files applications. Experimental result shows that proposed framework optimizes the performance of Hadoop in handling of massive small files as compared to existing approaches.

Key-Words / Index Term

Hadoop, Hadoop Distributed File System (HDFS), MapReduce, Hadoop Archives, Sequence File, Small Files

References

[1] Sagiroglu S, Sinanc, D, “Big Data: A Review”, IEEE,2013, pp. 42-47.
[2] Mukhtaj Khan , Yong Jin, Maozhen Li, Yang Xiang, and Changjun Jiang “Hadoop Performance Modeling for Job Estimation and Resource Provisioning” IEEE transactions on parallel and distributed systems, vol. 27, no. 2, february 2016, pp 441-454
[3] Fang Zhou, Hai Pham , Jianhui Yue, Hao Zou ,Weikuan Yu. "SFMapReduce: An Optimized MapReduce Framework for Small Files." IEEE ,2015, pp. 23-32.
[4] Xiaoyong Zhao, Yang Yang, Li-li Sun, Han Huang. "Metadata-Aware Small Files Storage Architecture on Hadoop." Springer ,2012, pp. 136–143.
[5] KunGao, Xuemin Mao. "Research on Massive Tile Data Management based on Hadoop." IEEE ,2016, pp. 01-05.
[6] Parth Gohil, Bakul Panchal,1. S. Dhobi. "A Novel Approach to Improve the Performance of Hadoop in Handling of Small Files." IEEE ,2015, pp. 1-5.
[7] Tanvi Gupta, SS Handa. "An Extended HDFS with an AVATAR NODE to handle both small files and to eliminate single point of failure." 2015 International Conference on Soft Computing Techniques and Implementations- (ICSCTI). Faridabad: IEEE, 2015. pp. 67-71.
[8] Aishwarya K, Arvind Ram A, Sreevatson M C, Chitra Babu, and Prabavathy B. "Efficient Prefetching Technique for Storage of Heterogeneous small files in Hadoop Distributed File System Federation." Fifth International Conference on Advanced Computing (ICoAC). IEEE, 2013. 523-530.
[9] Yanfei Guo et al “ iShuffle: Improving Hadoop Performance with Shuffle-on-Write” IEEE Transactions on Parallel and Distributed Systems, 2016, pp 1-12
[10] Guru Prasad M S, Nagesh H R and Swathi Prabhu “High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework”, International Journal of Intelligent Systems and Applications,2017, pp75-84
[11] Guru Prasad M S, Raju K and Nagesh H R “Novel Approaches for Performance Optimization of Hadoop Multi Node Cluster Architecture”, Elsevier Publications, 2014, pp 391-399
[12] Katayoun Neshatpour, Maria Malik, Mohammad Ali Ghodrat, Avesta Sasan, and Houman Homayoun “ Energy-Efficient Acceleration of Big Data Analytics Applications Using FPGAs” , IEEE International Conference on Big Data, 2015,pp115-123
[13] Ran Zheng, Qing Liu, Hai Jin. "Memory Data Management System for Rendering Applications." Second International Conference on Mathematics and Computers in Sciences and in Industry. IEEE, 2015. 302-308.
[14] Yang Zhang, Dan Liu. "Improving the Efficiency of Storing for Small Files in HDFS." International Conference on Computer Science and Service System. IEEE, 2012. 2239-2242.
[15] Yizhi Zhang, Heng Chen, Zhengdong Zhu, Xiaoshe Dong, Honglin Cui. "Small Files Storing and Computing Optimization." 11th International Conference on Natural Computation (ICNC). IEEE, 2015. 1269-1274.
[16] Bo Dong, Jie Qiu, Qinghua Zheng, Xiao Zhong, Jingwei Li, Ying Li. "A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files." 2010 IEEE International Conference on Services Computing. IEEE, 2010. 65-72.
[17] Chandrasekar S, Dakshinamurthy R, Seshakumar P G, Prabavathy B, Chitra Babu. "A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System." International Conference on Computer Communication and Informatics (ICCCI -2013). Coimbatore, INDIA: IEEE, 2013. 01-08.
[18] ChatupornVorapongkitipun, Natawut Nupairoj. "Improving Performance of Small-File Accessing in Hadoop." 11th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE, 2014. 200-205.
[19] Neethu Mohandas, Sabu M. Thampi. "Improving Hadoop Performance in Handling Small Files." Springer (2011): 187-194.
[20] Grant Mackey, Saba Sehrish, Jun Wang. "Improving Metadata Management for Small Files in." IEEE, ,2009,pp.01- 04.
[21] J. W. Jiangling Yin, D. H. Jian Zhou, Tyler Lukasiewicz, and J. Zhang, “Opass: Analysis and optimization of parallel data access on distributed file systems,” in IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE,2015.
[22] R. Din, Prabadevi B.,” Data Analyzing using Big Data (Hadoop) in Billing System ”, International Journal of Computer Sciences and Engineering, volume-5,Issue-5,2017,pp 84-88.