Open Access   Article Go Back

Distinction between Text and Non-Text Using Ensemble Classifier

Pradipta Karmakar1 , Chowdhury Md. Mizan2 , Sayak Dasgupta3 , Saptaparna Das4

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-5 , Page no. 52-56, May-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i5.5256

Online published on May 31, 2019

Copyright © Pradipta Karmakar, Chowdhury Md. Mizan, Sayak Dasgupta, Saptaparna Das . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Pradipta Karmakar, Chowdhury Md. Mizan, Sayak Dasgupta, Saptaparna Das, “Distinction between Text and Non-Text Using Ensemble Classifier,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, pp.52-56, 2019.

MLA Style Citation: Pradipta Karmakar, Chowdhury Md. Mizan, Sayak Dasgupta, Saptaparna Das "Distinction between Text and Non-Text Using Ensemble Classifier." International Journal of Computer Sciences and Engineering 7.5 (2019): 52-56.

APA Style Citation: Pradipta Karmakar, Chowdhury Md. Mizan, Sayak Dasgupta, Saptaparna Das, (2019). Distinction between Text and Non-Text Using Ensemble Classifier. International Journal of Computer Sciences and Engineering, 7(5), 52-56.

BibTex Style Citation:
@article{Karmakar_2019,
author = {Pradipta Karmakar, Chowdhury Md. Mizan, Sayak Dasgupta, Saptaparna Das},
title = {Distinction between Text and Non-Text Using Ensemble Classifier},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2019},
volume = {7},
Issue = {5},
month = {5},
year = {2019},
issn = {2347-2693},
pages = {52-56},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4197},
doi = {https://doi.org/10.26438/ijcse/v7i5.5256}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i5.5256}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4197
TI - Distinction between Text and Non-Text Using Ensemble Classifier
T2 - International Journal of Computer Sciences and Engineering
AU - Pradipta Karmakar, Chowdhury Md. Mizan, Sayak Dasgupta, Saptaparna Das
PY - 2019
DA - 2019/05/31
PB - IJCSE, Indore, INDIA
SP - 52-56
IS - 5
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
573 469 downloads 206 downloads
  
  
           

Abstract

In the recent era of technology, recognition of text and non-text images is a major challenge in the field of computer vision so as to efficiently extract the text from that image. There are many algorithms available for the extraction of the text from the image, however, the algorithm used for the extraction of the text from the images would have a higher efficiency if it is known beforehand that the image is a text image or a non-text image. However, in old manuscripts, the extraction of the text is very difficult. In that case, the algorithm for the distinction between the text and non-text becomes very easy for detection of any such text in the manuscript and extract the text from it. In our approach, we have built a system that takes any sort of image as an input. After the input of the image, it is then processed and converted into a binary image. Distance transform method is then applied and the measure of the distance between the various points in the image are then calculated. From the calculated points, duplicate points are merged into one point and are sorted in ascending order. The total area of the binary image is then calculated and also the image corresponding to each of the distance transform points are then calculated. The total area of the binary image is then divided by each of the area value of the corresponding distance transform points are the value extracted is known as the feature values. After getting all the feature values the whole value is then divided into small intervals and is then processed through the classifier. For our experimental purpose, we have chosen the ensemble classifier for our study and experimental analysis. The correctness of the classifier is then calculated and evaluated for the distinction between text and non-text images. This method is a very simple and accurate method for the distinction between the text and the non-text images and also helps in the extraction of the text from the image. Experiment have been done with simple text and non-text image dataset and the efficiency of the proposed method is then demonstrated.

Key-Words / Index Term

distinction between text and non-text, bar chart, classifier, ensemble classifier

References

[1]. Najwa Maria Chidiac, Pascal Damein and Charles Yacoub, “A robust algorithm for text extraction from images”, 39th International conference on Telecommunication and Signal Processing, 2016.
[2]. Radhika Patel and Suman K Mitra, “Extracting text from degraded documents”, 5th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, 2015.
[3]. R. Malik and SeongAh chin, “Extraction of text in images”, Proceedings of International Conference on Information Intelligence and Systems, 1999.
[4]. Sezer Karaoglu, Ran Tao, Theo Gevers and Arnold W. M. Smeulders, “Words matter: Scene Text for Image Clssification and Retrieval”, IEEE transactions on multimedia, vol. 19, no. 5, may 2017.
[5]. Chengquan Zhang, Cong Yao, Baoguang Shi and Xiang Bai, “Automatic discrimination of text and non-text natural images”, 13th International Conference on Document Analysis and Recognition, 2015.
[6]. Vishal Chowrasia, Sanjay Shilakari and Rajeev Pandey, “Implementation of Optical Character Recognition Using Machine Learning”,International Journal of Computer Science and Engineering, Vol.-6, Issue-6, jun 2018.