5+n tricks for GANs
The idea behind GANs is to train an implicit model that can produce samples that looks like samples taken from the actual data distribution by always trying to fool a discriminator which attempts to differentiate between real and generated samples. The generator and the discriminator are then trained jointly by playing the following min-max game.
However, it might not be practical to compute the following min-max game to see this consider a fixed generator, the above min-max game becomes the following optimisation problem
it can be shown that the optimal discriminator is
and the derivative of our discriminator is
If and do not share the same support, the derivative of our discriminator becomes unbounded and thus ill-defined. This means we cannot use gradient descent in such a case to find the equilibrium point of our min-max game. However, for datasets that lies on some lower-dimensional manifold, such as natural images, this is often the case because the manifold will only overlap with a distribution , where has non-zero measure everywhere, on the manifold. This tutorial thus outlines some basic techniques that can address this and other issues surrounding GAN training.
1. Instance noise to address the problem of disjoint support
One simple approach to addressing the disjoint support problem is to apply some instance noise to both the real and fake samples. The noise term can further be reduced/annealed over time to improve performance.
2. Use Spectral Normalisation
Another approach of addressing the issue of non-computable or intractable statistics a GAN might provide is to add regularity conditions such as Lipschitz continuity on the discriminator, one way of doing this is to use spectral normalisations, which use power iteration to ensure that the discriminator is Lipschitz continuous by normalising the weight matrix of the neural network by its largest singular value.
3. Use refinement techniques to incorporate discriminator information post-training
One simple approach to improving the quality of samples from a GAN is to use the truncation trick which samples from a truncated normal distribution instead of a normal distribution to avoid sampling at the tails reducing the chance of sampling from regions that lead to poorer sample quality however also reduces the diversity of the samples attained.
A more principled approach is to use the discriminator to perform rejection sampling using the discriminator but that is too expensive. A better approach is to use discriminator driven latent sampling DDLS or discriminator gradient flow DGFlow that uses Langevin sampling to sample from the discriminator.
4. Consider using latent optimisation to incorporate discriminator information during training
Another way of incorporating the discriminator is to sample in the gradient direction rather than from random samples from the prior. This can be done by first sampling a random and then computing and taking one gradient step in that direction generated as such
This results in better samples and more stable GAN training as it adds a symplectic gradient adjustment into the optimisation procedure, reducing the negative effects of cycling.
5. Top-k samples for discriminator training to incorporate discriminator information during training
The discriminator can also be used during training to take the top_k samples identified by the discriminator during training to compute the loss for the generator. This is because the bottom k-samples produce misleading updates, that point away from the data manifold.
The value of k is annealed through the training process, as the gan gets better and the discriminator gets more informative we can increase the value of k.