Open Access   Article Go Back

A Deep Learning Model for Image Caption Generation

P. Aishwarya Naidu1 , Satvik Vats2 , Gehna Anand3 , Nalina V.4

Section:Research Paper, Product Type: Journal Paper
Volume-8 , Issue-6 , Page no. 10-17, Jun-2020

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v8i6.1017

Online published on Jun 30, 2020

Copyright © P. Aishwarya Naidu, Satvik Vats, Gehna Anand, Nalina V. . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: P. Aishwarya Naidu, Satvik Vats, Gehna Anand, Nalina V., “A Deep Learning Model for Image Caption Generation,” International Journal of Computer Sciences and Engineering, Vol.8, Issue.6, pp.10-17, 2020.

MLA Style Citation: P. Aishwarya Naidu, Satvik Vats, Gehna Anand, Nalina V. "A Deep Learning Model for Image Caption Generation." International Journal of Computer Sciences and Engineering 8.6 (2020): 10-17.

APA Style Citation: P. Aishwarya Naidu, Satvik Vats, Gehna Anand, Nalina V., (2020). A Deep Learning Model for Image Caption Generation. International Journal of Computer Sciences and Engineering, 8(6), 10-17.

BibTex Style Citation:
@article{Naidu_2020,
author = {P. Aishwarya Naidu, Satvik Vats, Gehna Anand, Nalina V.},
title = {A Deep Learning Model for Image Caption Generation},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2020},
volume = {8},
Issue = {6},
month = {6},
year = {2020},
issn = {2347-2693},
pages = {10-17},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=5138},
doi = {https://doi.org/10.26438/ijcse/v8i6.1017}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v8i6.1017}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=5138
TI - A Deep Learning Model for Image Caption Generation
T2 - International Journal of Computer Sciences and Engineering
AU - P. Aishwarya Naidu, Satvik Vats, Gehna Anand, Nalina V.
PY - 2020
DA - 2020/06/30
PB - IJCSE, Indore, INDIA
SP - 10-17
IS - 6
VL - 8
SN - 2347-2693
ER -

VIEWS PDF XML
341 473 downloads 170 downloads
  
  
           

Abstract

Computer vision has been an area of interest for engineers and scientists who have been spearheading in the field of artificial intelligence from the late 1960s as it was very essential to give machines or robots the power of visualizing objects and activities around them like the human visual system. The ability to visualize 2-Dimensional images and extracting features from them can be utilised for developing various applications. The involvement of deep learning has been successful in bolstering the field of computer vision even further. The abundance of images in today`s digital world and the amount of information contained in them have made them a very valuable and research worthy data item. A deep learning-based image caption generator model can incorporate the areas of natural language processing and computer vision with deep learning to give a solution in which the machine can extract features from an image and then describe those features in a natural language. Thus, explaining the contents of the image in a human-readable format. This model has various applications ranging from social causes like being an aid to visually impaired to enhancing search experience of users over the web. This paper analyses the various state-of-the-art work in the field of image processing, computer vision and deep learning and presents a deep learning model that generates captions describing the images given as input to the system.

Key-Words / Index Term

Image caption, Recurrent Neural Networks, Feature Extraction, Image Description

References

[1] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio. ?Show, attend and tell: Neural image caption generation with visual attention? In International conference on machine learning, pp. 2048-2057, 2015.
[2] M. Tanti, A. Gatt and K.P. Camilleri. "What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?" arXiv preprint arXiv:1708.02043, 2017.
[3] R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, B. Plank. ?Automatic description generation from images: A survey of models, datasets, and evaluation measures?, Journal of Artificial Intelligence Research, Vol. 55, pp. 409-442, 2016.
[4] O. Vinyals, A. Toshev, S. Bengio, D. Erhan. ?Show and tell: A neural image caption generator?, In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156-3164, 2015.
[5] P. Kuznetsova, V. Ordonez, A.C. Berg, T.L. Berg, Y. Choi. ?Collective generation of natural image descriptions?, In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 359-368, 2012.
[6] S. Li, G. Kulkarni, T. Berg, A. Berg, Y. Choi. ?Composing simple image descriptions using web-scale n-grams?, IN Proceedings of the Fifteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 220-228, 2011.
[7] R. Kiros, R. Salakhutdinov, R. Zemel. ?Multimodal neural language model?, In International conference on machine learning, p. 595-603, 2014.
[8] Y. Yang, C. Teo, H. Daum? III, Y. Aloimonos. ?Corpus-guided sentence generation of natural images?, In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 444-454, 2011.
[9] W. Zaremba, I. Sutskever, O. Vinyals. ?Recurrent neural network regularization?, arXiv preprint arXiv:1409.2329, 2014.
[10] K. Barnard, P. Duygulu, D. Forsyth, .N de Freitas, D. Blei, M. Jordan. "Matching words and pictures." Journal of Machine Learning Research, Vol. 3(Feb), pp. 1107-1135, 2003.
[11] B. Yao, X. Yang, L. Lin, M. Lee, S. Zhu. ?I2T: Image parsing to Text Description?, In Proceedings of the IEEE, pp. 1485-1508, 2010.
[12] N. Kumar, D. Vigneswari, A. Mohan, K. Laxman, J. Yuvaraj. ?Detection and Recognition of Objects in Image Caption Generator System: A Deep Learning Approach?, In 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), pp. 107-109, 2019.
[13] S. Shabir, S. Arafat. ?An image conveys a message: A brief survey on image description generation?, In 2018 1st International Conference on Power, Energy and Smart Grid (ICPESG), pp. 1-6, 2018.
[14] J. Li, Y. Wong, Q. Zhao, M. Kankanhalli. ?Video Storytelling: Textual Summaries for Events?, IEEE Transactions on Multimedia, Vol. 22, Issue. 2, pp. 554-565, 2019.
[15] X. Li, S. Jiang. ?Know more say less: Image captioning based on scene graphs?, IEEE Transactions on Multimedia, Vol 21, Issue 8, pp. 2117-2130, 2019.
[16] S. Kavitha, A. Senthil Kumar, "Long Short-Term Memory Recurrent Neural Network Architectures", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 5 Issue 3, pp. 390-394, May-June 2019.
[17] Anitha Nithya R, Saran A , Vinoth R, "Adaptive Resource Allocation and Provisioning in MultiService Cloud Environments ", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 5 Issue 2, pp. 382-387, March-April 2019.