Those who can’t do, teach.
Those who can’t discriminate, generate.
For my final project for my Neural Network Self Study (NNOSS), I implemented ConvGAN which will take any set of images and attempt to generate more images similar to the ones in the set. Shouts out to Olin Alums Alec Radford and Luke Metz who authored the paper describing this architecture here. Code can be found here.
I trained the GAN on the CelebA dataset (celebrity faces), and the forest path and sea cliff categories of the MIT places database. Examples are shown below (I did not fully train the Sea Cliff one, so those are pretty noisy).
Nerd Alert: If you understand the long-winded title, read on. If you thought “Hey I recognize some of those ML buzzwords!”, checkout links on: Deep, Convolutional, Generative, Adversarial, Networks (actually just check out the links at the bottom of the post). If you thought “oh boy, well I’ll figure out what he was talking about” or “I probably can’t say that 3 times fast”, then this may not be the section for you.
Architecture: I used a Generator and Discriminator architecture with 5 hidden layers all of 128 5×5 filters with stride 2 (except the last). The latent space was 100 dimensional and the generated images were around 150×150.
Training: I used Adam Optimizer with a learning rate of 1e-4 and a batch size of 32 to start, increasing to 64 later on in training.
I arrived at this architecture/process through some light experimentation and trial and error and it differed slightly on different training sets but was mostly constant.
N = 1 Observations:
- Batch size is important. When training anything below 32 would not get very far for me, and even at 32 it will bounce around a poor minimum. I needed to bump the batch size up to 64 to reach a good minimum. I assume this is due to the dimensionality (~128,~128) and the variance of the data, this could be lowered if either were better. This does have implications for training as some computers will not have enough RAM to run this efficiently.
- I found that batch normalization after each layer in the Generator was essential for successful training.
- Pay a lot of attention to what activation functions you use. Leaky Relu was somewhat helpful to avoid dead nodes. Also I would think that sigmoid would be ideal for the final output of the generator, but I found tanh gave better training results.
- If your data is supervised, append in your labels in multiple places. For a typical example, if you were to try and generate MNIST data points and you wanted to specify which number you want to generate, you (like me) might think that you can append it onto your latent space vector and be fine. Although this might work eventually, for any deep model the path of gradients between the discriminator outputs and the latent space is very long. It is much easier if at each layer you append (with appropriate dimensions) your class. That way there are multiple, shorter paths between the output and input for gradients to flow.
- For most other aspects there is some wiggle room. I found for this particular data set number of layers, depth of layers and learning rate were less important as normal compared to the aforementioned observations.
End Nerd Alert
Finally as always if you have any questions on implementation or anything else feel free to reach out through the links below.