Open Access   Article Go Back

Statistical Predictabilty in Big Data Analytics with Data Partitioning

K. Saritha1 , Sajimon Abraham2

Section:Research Paper, Product Type: Journal Paper
Volume-06 , Issue-06 , Page no. 80-85, Jul-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6si6.8085

Online published on Jul 31, 2018

Copyright © K. Saritha, Sajimon Abraham . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: K. Saritha, Sajimon Abraham, “Statistical Predictabilty in Big Data Analytics with Data Partitioning,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.06, pp.80-85, 2018.

MLA Style Citation: K. Saritha, Sajimon Abraham "Statistical Predictabilty in Big Data Analytics with Data Partitioning." International Journal of Computer Sciences and Engineering 06.06 (2018): 80-85.

APA Style Citation: K. Saritha, Sajimon Abraham, (2018). Statistical Predictabilty in Big Data Analytics with Data Partitioning. International Journal of Computer Sciences and Engineering, 06(06), 80-85.

BibTex Style Citation:
@article{Saritha_2018,
author = {K. Saritha, Sajimon Abraham},
title = {Statistical Predictabilty in Big Data Analytics with Data Partitioning},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {7 2018},
volume = {06},
Issue = {06},
month = {7},
year = {2018},
issn = {2347-2693},
pages = {80-85},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=449},
doi = {https://doi.org/10.26438/ijcse/v6i6.8085}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i6.8085}
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=449
TI - Statistical Predictabilty in Big Data Analytics with Data Partitioning
T2 - International Journal of Computer Sciences and Engineering
AU - K. Saritha, Sajimon Abraham
PY - 2018
DA - 2018/07/31
PB - IJCSE, Indore, INDIA
SP - 80-85
IS - 06
VL - 06
SN - 2347-2693
ER -

           

Abstract

The huge volumes of data which cannot be manipulated easily by commonly available tools are termed as Big Data. Big Data analytics gives competitive opportunities in designing business plans for Business Analytics. The results are used for taking intelligent business decisions; hence it must be accurate and well-timed. For analytical purpose we use Multiple Linear Regression (MLR) model in the statistical method, a type of Supervised Machine Learning Algorithm. Performance of the particular MLR model with one quantitative dependent attribute and four independent attributes are evaluated using splitting up of the whole data set with Cross-Validation technique. This technique is used to validate the accuracy of model developed from training data with test data to control the problem like over fitting. Here we use Hold-Out Cross Validation method with serial and random partitioning. The data set from UCI machine learning repository are evaluated through simulation methods to check the performance. The model generated in training data are validated with test data, the evaluation shows that the result obtained is a generalized one. The proposed MLR model can be used in the new data set for an accurate result. Here we obtained that the accuracy, measuring with random partitioning is a better method.

Key-Words / Index Term

Big Data Analytics, Multiple Linear Regression, Predictive Analytics, Validation Methods

References

[1] Kumar, P., & Rathore, D. V. S. (2014). “Efficient capabilities of processing of big data using hadoop map reduce”. International Journal of Advanced Research in Computer and Communication Engineering, 3(6), 7123-6..
[2] Feldman, D., Schmidt, M., & Sohler, C. (2013, January). “Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering”. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1434-1453). Society for Industrial and Applied Mathematics.
[3] Ha, S., Lee, S., & Lee, K. (2014). “Standardization Requirements Analysis on Big Data in Public Sector based on Potential Business Models”. International Journal of Software Engineering and Its Applications, 8(11), 165-172.
[4] Galit Shmueil, “To Explin or Predict?”, Statistical science, vol25 © Institute of Mathematical Science, 2010
[5] Saritha, K., & Abraham, S. (2017, July). “Prediction with partitioning: Big data analytics using regression techniques”. In Networks & Advances in Computational Technologies (NetACT), 2017 International Conference on (pp. 208-214). IEEE.
[6] Dutta, P. S., & Tahbilder, H. (2014). “Prediction of rainfall using data mining technique over Assam”. Indian Journal of Computer Science and Engineering (IJCSE), 5(2), 85-90.
[7] Ahmet A Yildirim, Cem OZdogan, Dan Watson, “ Parallel Data Reduction Techniques for Big Data sets”, Research gate, 2016.
[8] Astrid Scheneider, Gerhard Hommel and Maria Blettner, “Linear Regression Analysis”, 2010; 107(44) 776-82
[9] Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314-347
[10] Wang, H., Xu, Z., Fujita, H., & Liu, S. (2016). “Towards felicitous decision making: An overview on challenges and trends of Big Data”. Information Sciences, 367, 747-765.
[11] Bilal, M., Oyedele, L. O., Qadir, J., Munir, K., Ajayi, S. O., Akinade, O. O., ... & Pasha, M. (2016). “Big Data in the construction industry: A review of present status, opportunities, and future trends”, Advanced Engineering Informatics, 30(3), 500-521.
[12] Saritha, K., & Abraham, S. “Big Data Challenges and Issues: Review on Analytic Techniques”. Indian Journal of Computer Science and Engineering (IJCSE) Vol. 8 No. 3 Jun-Jul 2017
[13] https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset