Open Access   Article

Relevance Based Feature Selection Algorithm For Efficient Preprocessing of Textual Data Using HMM

R. Merlin Packiam1 , V. Sinthu Janita Prakash2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-1 , Page no. 15-21, Jan-2019


Online published on Jan 31, 2019

Copyright © R. Merlin Packiam, V. Sinthu Janita Prakash . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library


IEEE Style Citation: R. Merlin Packiam, V. Sinthu Janita Prakash, “Relevance Based Feature Selection Algorithm For Efficient Preprocessing of Textual Data Using HMM”, International Journal of Computer Sciences and Engineering, Vol.7, Issue.1, pp.15-21, 2019.

MLA Style Citation: R. Merlin Packiam, V. Sinthu Janita Prakash "Relevance Based Feature Selection Algorithm For Efficient Preprocessing of Textual Data Using HMM." International Journal of Computer Sciences and Engineering 7.1 (2019): 15-21.

APA Style Citation: R. Merlin Packiam, V. Sinthu Janita Prakash, (2019). Relevance Based Feature Selection Algorithm For Efficient Preprocessing of Textual Data Using HMM. International Journal of Computer Sciences and Engineering, 7(1), 15-21.

136 223 downloads 16 downloads


With a rapid growth of the world of Internet, the social media is eventually growing and is playing a very major role in most of our lives. There are various social networking sites such as Twitter, Google+, Face book which provide a platform for the people to present themselves. Twitter is an efficient micro-blogging tool which has become very popular throughout the world. Nowadays, there is an ongoing trend of posting every thought and emotion of one’s life on these social networking sites. Due to this, emotion analysis has gained popularity in analyzing the thoughts, opinions, feelings, sentiments, etc., of various people. But handling such a huge amount of unstructured data is a tedious task to take up. Feature selection is the process of reducing the number of collected features to a relevant subset of features and is often used to combat the curse of dimensionality. This paper proposes a Relevance Feature Selection for efficient analytics on twitter data. After selecting the features from the tweets, Support Vector Machine (SVM) based classification is applied to analyze the data using Hidden Morkov Model(HMM). The performance of the proposed method has been evaluated through experiments. The entire research was evaluated through publicly available twitter data set with various metrics such as precision, recall, F-measure and Accuracy. By comparing the obtained results with the existing research results, the performance of the proposed work provides better result.

Key-Words / Index Term

Twitter, Bigdata,Feature Selection ,HMM


[1] Tsapatsoulis, Nicolas, and Constantinos Djouvas (2017), "Feature extraction for tweet classification: Do the humans perform better?.", Semantic and Social Media Adaptation and Personalization (SMAP), 2017 12th International Workshop on. IEEE, 2017.
[2] Packiam, R. Merlin, and V. Sinthu Janita Prakash. "An empirical study on text analytics in big data." Computational Intelligence and Computing Research (ICCIC), 2015 IEEE International Conference on. IEEE, 2015.
[3] Anna Stavrianou, Caroline Brun, Tomi Silander, and Claude Roux, "NLP-based feature extraction for automated tweet classification". In Proceedings of DMNLP, Workshop at ECML/PKDD .7,2014.
[4] Prusa JD, Khoshgoftaar TM, Dittman DJ , "Impact of feature selection techniques for tweet sentiment classification", In: Proceedings of the 28th International FLAIRS Conference; 2015. p. 299–304,2015.
[5] harmendra Sharma, Suresh Jain, “Evaluation of Stemming and Stop Word Techniques on Text Classification Problem”, International Journal of Scientific Research in Computer Science and Engineering Science and Engineering, Vol: 3, No. :2, 2015.
[6] Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert Trevino, Jiliang Tang, and Huan Liu. “Feature selection: A data perspective”. arXiv preprint arXiv:1601.07996, 2016
[7] Riham Mansour, Mohamed Farouk Abdel Hady,Eman Hosam, HaniAmr, and Ahmed Ashour , "Feature selection for twitter sentiment analysis: An experimental study", In International Conference on Intelligent Text Processing and Computational Linguistics, pages 92–103. Springer,2015.
[8] O. Soufan, D. Kleftogiannis, P.Kalnis, V. B. Bajic, and D.Gupta , "DWFS: a wrapper feature selection tool based on a parallel genetic algorithm", PLoSONE, vol. 10, no. 2, Article ID e0117988, 2015.
[9] S. Wang, W. Pedrycz, Q. Zhu, and W. Zhu, "Subspace learning for unsupervised feature selection via matrix factorization," Pattern Recognit. , vol. 48, no. 1, pp. 10–19, 2015.
[10] H.B.Nguyen, B.Xue, I.Liu, P. Andreae, and M. Zhang , "Gaussian transformation based representation in particle swarm optimisation for feature selection" in Applications of Evolutionary Computation (LNCS 9028). Cham, Switzerland: Springer, pp. 541–553,2015.
[11] E. Hancer, B. Xue, D. Karaboga, and M. Zhang , "A binary ABC algorithm based on advanced similarity scheme for feature selection" Appl. Soft Comput., vol. 36, pp. 334–348, Nov. 2015
[12] Manek,A.S., Shenoy,P.D., Mohan,M.C., and Venugopal,K. , "Aspect term extraction for sentiment analysis in large movie reviews using Gini index feature selection method and SVM classifier", World Wide Web, 20 (2), 135–154,2017 .
[13] Agarwal B., Mittal N., "Machine Learning Approach for Sentiment Analysis", In: Prominent Feature Extraction for Sentiment Analysis. Socio-Affective Computing. Springer, Cham, 2016.
[14] Akshi Kumar, Shikhar Garg, Shobhit Verma and Siddhant Kumar(2019), "Sentiment Analysis Using Cuckoo Search for Optimized Feature Selection on Kaggle Tweets", International Journal of Information Retrieval Research (IJIRR) ,vol. 9, no. 1, 2019.
[15] A. Tommasel, D. Godoy “A Social-aware online short-text feature selection technique for social media “ Inf. Fusion, 40 ( pp. 1-17. 2018.
[16] Sinthu, R. Merlin Packiam, Dr V., and Janita Prakash. "Multilevel Sparse Dimension Selection Approach For Improved Big Data Processing Using Taxonomy." International Journal Of Innovation In Engineering Research And Management, ISSN 2348-4918, ISO 2000-9001 certified, E 4, no. 4 ,2017.
[17] Packiam, R. Merlin, and V. Sinthu Janita Prakash. "A Novel Integrated Framework Based on Modular Optimization for Efficient Analytics on Twitter Big Data." In Information and Communication Technology for Intelligent Systems, pp. 213-224. Springer, Singapore, 2019.
[18] J. A. V. Montero and L. E. S. Sucar , "Feature selection for visual gesture recognition using hidden Markov models" in Proc. 5th Int. Conf. Comput. Sci. (ENC), pp. 196-203, Sep. 2004..
[19] J. Nouza , "Feature selection methods for hidden Markov model-based speech recognition", in Proc. 13th Int. Conf. Pattern Recognit., vol. 2, pp. 186-190,1996.
[20] F. I. Bashir, A. A. Khokhar, and D. Schonfeld "Object trajectory-based activity classification and recognition using hidden Markov models", IEEE Trans. Image Process., vol. 16, no. 7, pp. 1912-1919, Jul. 2007
[21] H. Zhu, Z. He, and H. Leung , "Simultaneous feature and model selection for continuous hidden Markov models", IEEE Signal Process. Lett.,vol. 19, no. 5, pp. 279-282, May 2012.
[22] Roberto A. Cárdenas-Ovando, et al., "A feature selection strategy for gene expression time series experiments with hidden Markov models", bioRxiv preprint, 2018.
[23] Adams S, Beling P, Cogill R. Feature Selection for hidden Markov models and hidden Semi-Markov models. IEEE. Translations and content mining. Vol.4, Iss.1, pp. 1642–1657, Apr. 2016.
[24] Zheng Y, Jeon B, Sun L, Zhang J, Zhang H. Student’s t-hidden Markov model for Unsupervised Learning Using Localized Feature Selection. IEEE Transactions on Circuits and Systems for Video Technology. Vol. 9,Iss.12, pp:1–10, July 2017
.[25] Vapnik, V. "Statistical Learning Theory". Wiley, New York (1998)
[26] Kreßel, U. "Pairwise classification and support vector machines". In: Schölkopf, B., Burges, C.,Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 255–268. MIT Press, Cambridge 1999
[27] Krishnalal, G , Babu Rengarajan, S and G Srinivasagan, K . "A New Text Mining Approach Based on HMM-SVM for Web News Classification", International Journal of Computer Application. vol 1, Issue.9,2010.