Open Access   Article Go Back

Optimization of Map Reduce Using Maximum Cost Performance Strategy

A. Saran Kumar1 , V. Vanitha Devi2

Section:Research Paper, Product Type: Journal Paper
Volume-4 , Issue-6 , Page no. 78-87, Jun-2016

Online published on Jul 01, 2016

Copyright © A. Saran Kumar, V. Vanitha Devi . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: A. Saran Kumar, V. Vanitha Devi, “Optimization of Map Reduce Using Maximum Cost Performance Strategy,” International Journal of Computer Sciences and Engineering, Vol.4, Issue.6, pp.78-87, 2016.

MLA Style Citation: A. Saran Kumar, V. Vanitha Devi "Optimization of Map Reduce Using Maximum Cost Performance Strategy." International Journal of Computer Sciences and Engineering 4.6 (2016): 78-87.

APA Style Citation: A. Saran Kumar, V. Vanitha Devi, (2016). Optimization of Map Reduce Using Maximum Cost Performance Strategy. International Journal of Computer Sciences and Engineering, 4(6), 78-87.

BibTex Style Citation:
@article{Kumar_2016,
author = {A. Saran Kumar, V. Vanitha Devi},
title = {Optimization of Map Reduce Using Maximum Cost Performance Strategy},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2016},
volume = {4},
Issue = {6},
month = {6},
year = {2016},
issn = {2347-2693},
pages = {78-87},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=971},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=971
TI - Optimization of Map Reduce Using Maximum Cost Performance Strategy
T2 - International Journal of Computer Sciences and Engineering
AU - A. Saran Kumar, V. Vanitha Devi
PY - 2016
DA - 2016/07/01
PB - IJCSE, Indore, INDIA
SP - 78-87
IS - 6
VL - 4
SN - 2347-2693
ER -

VIEWS PDF XML
1546 1409 downloads 1476 downloads
  
  
           

Abstract

Big data is a buzzword, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.Parallel computing is a frequently used method for large scale data processing. Many computing tasks involve heavy mathematical calculations, or analysing large amounts of data. These operations can take a long time to complete using only one computer. Map Reduce is one of the most commonly used parallel computing frameworks. The execution time of the tasks and the throughput are the two important parameters of Map Reduce. Speculative execution is a method of backing up of slowly running tasks on alternate machines. Multiple speculative execution strategies have been proposed, but they have some pitfalls: (i) Use average progress rate to identify slow tasks while in reality the progress rate can be unstable and misleading, (ii) Do not consider whether backup tasks can finish earlier when choosing backup worker nodes. This project aims to improve the effectiveness of speculation execution significantly. To accurately and promptly identify the appropriate tasks, the following methods are employed: (i) Use both the progress rate and the process bandwidth within a phase to select slow tasks, (ii) Use exponentially weighted moving average (EWMA) to predict process speed and calculate a task’s remaining time, (iii) Determine which task to backup based on the load of a cluster using a cost-benefit model.

Key-Words / Index Term

Map reduce, Cost Performance strategy, Big Data, Stragglers, Speculation

References

[1] J. Dean and S. Ghemawat, “Map reduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, pp. 107–113, January 2008.
[2] “Apache hadoop, http://hadoop.apache.org/.”
[3] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc. of the 2nd ACM SIGOPS/Euro Sys European Conference on Computer Systems 2007, ser. Euro Sys ’07, 2007.
[4] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving map reduce performance in heterogeneous environments,” in Proc. of the 8th USENIX conference on Operating systems design and implementation, ser. OSDI’08, 2008.
[5] G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the outliers in map-reduce clusters using mantri,” in Proc. of the 9th USENIX conference on Operating systems design and implementation, ser. OSDI’10, 2010.
[6] Y. Kwon, M. Balazinska, and B. Howe, “A study of skew in map reduce applications,” in The 5th Open Cirrus Summit, 2011.
[7] P.H and Ellaway, “Cumulative sum technique and its application to the analysis of peri stimulus time histograms,” Electroencephalography and Clinical Neurophysiology, vol. 45, no. 2, pp. 302–304, 1978.
[8] K. Avi, K. Yaniv, L. Dor, L. Uri, and L. Anthony, “Kvm: The linux virtual machine monitor,” Proc. of the Linux Symposium, Ottawa, Ontario, 2007, 2007.
[9] Quiane-Ruiz,Pinkel, C.,Schad, J. ,Dittrich, J.“RAFTing Map Reduce: Fast recovery on the RAFT” Data Engineering (ICDE), 2011 IEEE 27th International Conference in Hannover, Publication Year: 2011.
[10] G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I.Stoica, D. Harlan, and E. Harris, “Scarlett: Coping with Skewed Content Popularity in Map reduce Clusters,” Proc. Sixth Conf. Computer Systems (EuroSys ’11), 2011.
[11] B. Nicolae, D. Moise, G. Antoniu, L. Bouge, and M. Dorier,“Blobseer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications,” Proc. IEEE Int’l Symp. Parallel Distributed Processing (IPDPS), Apr. 2010.
[12] J. Leverich and C. Kozyrakis, “On the Energy (In)Efficiency of Hadoop Clusters,” ACM SIGOPS Operating Systems Rev., vol. 44,pp. 61-65, Mar. 2010.
[13] T. Sandholm and K. Lai, “Mapreduce Optimization Using Regulated Dynamic Prioritization,” Proc. 11th Int’l Joint Conf. Measurement and Modeling of Computer Systems, (SIGMETRICS ’09),2009.
[14] M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A.Goldberg, “Quincy: Fair Scheduling for Distributed Computing Clusters,” Proc. ACM SIGOPS 22nd Symp. Operating Systems Principles(SOSP ’09), 2009.
[15] M. Zaharia, D. Borthakur, J. SenSarma, K. Elmeleegy, S. Shenker,and I. Stoica, “Delay Scheduling: A Simple Technique for AchievingLocality and Fairness in Cluster Scheduling,” Proc. Fifth European Conference Computer Systems (EuroSys ’10), 2010.
Kala Karun, A ; Chitharanjan, K ; "A review on hadoop — HDFS infrastructure extensions ", IEEE Conference on Information & Communication Technologies (ICT), JeJu Island, April 2013. Page(s): 132 - 137.
[16] D.Deepika1, K.Pugazhmathi, “Efficient Indexing and Searching of Big Data in HDFs”, International Journal of Computer Sciences and Engineering (IJCSE) Vol.-4(4), Apr 2016, E-ISSN: 2347-2693.
[17] Tanuja A, Swetha Ramana D, “Processing and Analyzing Big data using Hadoop”, International Journal of Computer Sciences and Engineering (IJCSE) Vol.-4(4), PP(91-94) April 2016, E-ISSN: 2347-2693.