A generative adversarial network (GAN) is a machine learning (ML) model in which two neural networks compete with each other to be more accurate in their predictions. GANs typically run unattended and use a zero-sum cooperative game framework for learning.
The two neural networks that make up a GAN are called the generator and the discriminator. The generator is a convolutional neural network and the discriminator is a deconvolutional neural network. The purpose of the generator is to artificially fabricate results that could easily be mistaken for real data. The purpose of the discriminator is to identify which outputs it receives have been artificially created.
Basically, GANs create their own training data. As the feedback loop between the antagonistic networks continues, the generator will start to produce higher quality results and the discriminator will get better at flagging the data that has been artificially created.
How GANs work
The first step in establishing a GAN is to identify the desired end result and collect an initial training data set based on those parameters. This data is then randomized and fed into the generator until it reaches basic accuracy in producing results.
After this, the generated images are fed into the discriminator along with actual data points from the original concept. The discriminator filters the information and returns a probability between 0 and 1 to represent the authenticity of each image (1 maps to real and 0 maps to false). These values are then manually checked for success and repeated until the desired result is reached.
Popular use cases for GAN
GANs are becoming a popular ML model for online retail due to their ability to understand and recreate visual content with increasingly remarkable accuracy. Use cases include:
- Filling images from an outline.
- Generation of a realistic image from text.
- Produce photorealistic renderings of product prototypes.
- Converting black and white images to color.
In video production, GANs can be used to:
- Model patterns of human behavior and movement within a framework.
- Predict subsequent video frames.
- Create a deep fake