Expanding the Canvas of Creativity: Exploring Text-Driven Image Generation with Generative Adversarial Network Variations

Yong, Justin Zhi Hern (2024) Expanding the Canvas of Creativity: Exploring Text-Driven Image Generation with Generative Adversarial Network Variations. Final Year Project (Bachelor), Tunku Abdul Rahman University of Management and Technology.

Text
RDS_Justin Yong Zhi Hern_Fulltext.pdf
Restricted to Registered users only
Download (2MB)

Abstract

This research focuses on the evaluation of text-to-image (T2I) generation performances of three variants of generative adversarial network (GAN), namely Vanilla GAN, Conditional GAN (CGAN) and Deep Convolutional GAN (DCGAN). The main motivation behind the study is the lack of exploration and comparison between the mentioned GAN variants in the specific domain of T2I generation. The study is imperative as such technologies can be implemented in various fields in order to drastically simplify image gathering processes. The Oxford-102 flower dataset is used in conjunction with the image descriptions and class information for training. The study keeps the overall architecture of the GAN variants similar to better understand how the main differences of the nature of the GAN variants affect their performances on T2I generation tasks. Common training techniques and model parameters are examined and adopted with specific adaptations to allow achievability within hardware constraints. By utilising evaluation metrics that are commonly used in other similar works such as inception score and Fréchet inception distance, the results of this study act as a reference point for future works of the same domain. Besides, qualitative evaluation was also utilised, highlighting certain parallels and limitations compared to quantitative methods. At the end of the study, it was found that Vanilla GAN performed best, followed by CGAN and DCGAN under the circumstance that all variants were kept as similar as possible. However, hardware constraints severely limited the flexibility in configuring optimal training and testing parameters. Future works may further be done to explore techniques in reducing computational resource requirements for the training of GANs, discover the importance and impact of noise and word embedding sizes towards generated image, as well as incorporate natural language processing techniques

Item Type:	Final Year Project
Subjects:	Technology > Technology (General) Science > Computer Science > Data mining. Big data
Faculties:	Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Data Science
Depositing User:	Library Staff
Date Deposited:	03 Sep 2024 06:49
Last Modified:	03 Sep 2024 06:49
URI:	https://eprints.tarc.edu.my/id/eprint/29995