Skip to content

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.

License

Notifications You must be signed in to change notification settings

Vishal-V/StackGAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StackGAN

Text to Photo-Realistic Image Synthesis


Dependencies

tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0

Downloads

  • To download all the dependencies, simply execute
pip install -r requirements.txt
  • To download the CUB 200 dataset, simply execute the data_download.py file
python data_download.py
  • Download the Char-RNN-CNN embeddings from this link: download link and unzip it in place.
unzip birds.zip

Training

  • The model.py file contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level as model.py.
python model.py

Architecture

  • Stage 1
    • Text Encoder Network
      • Text description to a 1024 dimensional text embedding
      • Learning Deep Representations of Fine-Grained Visual Descriptions Arxiv Link
    • Conditioning Augmentation Network
      • Adds randomness to the network
      • Produces more image-text pairs
    • Generator Network
    • Discriminator Network
    • Embedding Compressor Network
    • Outputs a 64x64 image

  • Stage 2
    • Text Encoder Network
    • Conditioning Augmentation Network
    • Generator Network
    • Discriminator Network
    • Embedding Compressor Network
    • Outputs a 256x256 image

Reference Papers

  1. StackGAN: Text to photo-realistic image synthesis [Arxiv Link]
  2. Improved Techniques for Training GANs [Arxiv Link]
  3. Generative Adversarial Text to Image Synthesis [Arxiv Link]
  4. Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link]

Note

This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.

About

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published