A Survey of Text-to-Image Generative Adversarial Networks

Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande

Open Access Article Go Back

A Survey of Text-to-Image Generative Adversarial Networks

Siddhivinayak Kulkarni¹ , Amol Dhondse² , Anurag Katakkar³ , Nitish Bannur⁴ , Trupti Deshpande⁵

Section:Survey Paper, Product Type: Journal Paper
Volume-07 , Issue-07 , Page no. 54-61, Mar-2019

Online published on Mar 30, 2019

Copyright © Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande, “A Survey of Text-to-Image Generative Adversarial Networks,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.07, pp.54-61, 2019.

MLA Style Citation: Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande "A Survey of Text-to-Image Generative Adversarial Networks." International Journal of Computer Sciences and Engineering 07.07 (2019): 54-61.

APA Style Citation: Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande, (2019). A Survey of Text-to-Image Generative Adversarial Networks. International Journal of Computer Sciences and Engineering, 07(07), 54-61.

BibTex Style Citation:
@article{Kulkarni_2019,
author = {Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande},
title = {A Survey of Text-to-Image Generative Adversarial Networks},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {07},
Issue = {07},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {54-61},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=903},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=903
TI - A Survey of Text-to-Image Generative Adversarial Networks
T2 - International Journal of Computer Sciences and Engineering
AU - Siddhivinayak Kulkarni, Amol Dhondse, Anurag Katakkar, Nitish Bannur, Trupti Deshpande
PY - 2019
DA - 2019/03/30
PB - IJCSE, Indore, INDIA
SP - 54-61
IS - 07
VL - 07
SN - 2347-2693
ER -

Abstract

In recent years, generative models have gained alot of attention in the deep learning community. In particular,Generative Adversarial Networks (GANs), proposed by Ian Goodfellow et al. in 2014, and their variants have emerged as a powerful method which performs significantly better than other generative models such as Restricted Boltzmann Machines or Variational Auto-Encoders. In this paper, we focuson a specific type of GANs, the Text-to-Image GANs, and review some of the most seminal work which has been conducted in this area. We provide a high-level description of the architectural components of these models and also review their performance on variousdatasets. Further, we discuss how these architectures are suitedfor the particular use case of text-to-face image synthesis for generating images of human faces from text descriptions.

Key-Words / Index Term

GenerativeAdversarial Networks, Text-to-ImageGANs, Deep Learning

References

[1] Ian Goodfellow and Yoshua Bengio and Aaron Courville, ``Deep Learning,`` MIT Press, 2016.
[2] Goodfellow, Ian J. et al. ``Generative Adversarial Nets,`` NIPS (2014).
[3] Bodnar, Cristian. “Text to Image Synthesis Using Generative Adversarial Networks.” CoRR abs/1805.00676 (2018): n. pag.
[4] Gregor, Karol et al. “DRAW: A Recurrent Neural Network for Image Generation.” ICML (2015).
[5] Kingma, Diederik P. and Max Welling. “Auto-Encoding Variational Bayes.” CoRR abs/1312.6114 (2013): n. pag.
[6] Mansimov, Elman et al. “Generating Images from Captions with Attention.” CoRR abs/1511.02793 (2015): n. pag.
[7] Reed, Scott E. et al. “Generative Adversarial Text to Image Synthesis.” ICML (2016).
[8] Mirza, Mehdi and Simon Osindero. “Conditional Generative Adversarial Nets.” CoRR abs/1411.1784 (2014): n. pag.
[9] Reed, Scott E. et al. “Learning Deep Representations of Fine-Grained Visual Descriptions.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 49-58.
[10] Kingma, Diederik P. and Jimmy Ba. “Adam: A Method for Stochastic Optimization.” CoRR abs/1412.6980 (2014): n. pag.
[11] Bengio, Yoshua et al. “Better Mixing via Deep Representations.” ICML (2013).
[12] Reed, S., Sohn, K., Zhang, Y., and Lee, H. "Learning to disentangle factors of variation with manifold interaction," (ICML 2014).
[13] Radford, Alec et al. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.” CoRR abs/1511.06434 (2015): n. pag.
[14] Reed, Scott E. et al. “Learning What and Where to Draw.” NIPS (2016).
[15] Wah, Catherine et al. “The Caltech-UCSD Birds-200-2011 Dataset.” (2011).
[16] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, June 2014.
[17] Dash, Ayushman et al. “TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network.” CoRR abs/1703.06412 (2017): n. pag.
[18] Dong, Hao et al. “I2T2I: Learning text to image synthesis with textual data augmentation.” 2017 IEEE International Conference on Image Processing (ICIP) (2017): 2015-2019.
[19] Zhang, Han et al. “StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 5908-5916.
[20] Xu, Tao et al. “AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks.” CoRR abs/1711.10485 (2017): n. pag.
[21] Sharma, Shikhar et al. “ChatPainter: Improving Text to Image Generation using Dialogue.” CoRR abs/1802.08216 (2018): n. pag.
[22] Zhang, Han et al. “StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.” IEEE transactions on pattern analysis and machine intelligence (2018): n. pag.
[23] Gong, Fuzhou and Zigeng Xia. “Generate the corresponding Image from Text Description using Modified GAN-CLS Algorithm.” CoRR abs/1806.11302 (2018): n. pag.
[24] X. Wu, K. Xu and P. Hall, "A survey of image synthesis and editing with generative adversarial networks," in Tsinghua Science and Technology, vol. 22, no. 6, pp. 660-674, December 2017.doi: 10.23919/TST.2017.8195348
[25] Huang, He et al. “An Introduction to Image Synthesis with Generative Adversarial Nets.” CoRR abs/1803.04469 (2018): n. pag.
[26] Goodfellow, Ian J.. “NIPS 2016 Tutorial: Generative Adversarial Networks.” CoRR abs/1701.00160 (2016): n. pag.
[27] Ledig, Christian et al. “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): 105-114.
[28]Sønderby, Casper Kaae et al. “Amortised MAP Inference for Image Super-resolution.” CoRR abs/1610.04490 (2016): n. pag.
[29] Wang, Yifan et al. “A Fully Progressive Approach to Single-Image Super-Resolution.” CoRR abs/1804.02900 (2018): n. pag.
[30] Gadelha, Matheus et al. “3D Shape Induction from 2D Views of Multiple Objects.” 2017 International Conference on 3D Vision (3DV) (2017): 402-411.
[31] Nilsback, Maria-Elena and Andrew Zisserman. “Automated Flower Classification over a Large Number of Classes.” 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing (2008): 722-729.
[32] Yu, Fisher et al. “LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop.” CoRR abs/1506.03365 (2015): n. pag.
[33] Lin, Tsung-Yi et al. “Microsoft COCO: Common Objects in Context.” ECCV (2014).
[34] Deng, Jia et al. “ImageNet: A large-scale hierarchical image database.” 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009): 248-255.
[35] Cho, Kyunghyun et al. “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” SSST@EMNLP (2014).
[36] Schuster, Mike and Kuldip K. Paliwal. “Bidirectional recurrent neural networks.” IEEE Trans. Signal Processing 45 (1997): 2673-2681.
[37] Szegedy, Christian et al. “Rethinking the Inception Architecture for Computer Vision.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 2818-2826.
[38]Cohn, Gabe. "AI-Art at Christie`s sells for $432,500". The New York Times, October 25, 2018.
[39] Kiros, Ryan et al. “Skip-Thought Vectors.” NIPS (2015).
[40] Graves, Alex and Jürgen Schmidhuber. “Framewise phoneme classification with bidirectional LSTM and other neural network architectures.” Neural networks: the official journal of the International Neural Network Society 18 5-6 (2005): 602-10.
[41] Hochreiter, Sepp and Jürgen Schmidhuber. “Long Short-Term Memory.” Neural Computation 9 (1997): 1735-1780.
[42] Paszke, Adam et al. “Automatic differentiation in PyTorch.” (2017).
[43] Das, Abhishek et al. “Visual Dialog.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): 1080-1089.

Citations	2325
h-index	16
i10-index	47