In recent years we heard a lot about A.I. taking our jobs or at least driving us to our current jobs. While the debate is still open, I think most of us would agree that some professions are more endangered than others, grocery store cashiers, for example, are in the high-risk group due to increasing automation investments in this area. However, if you are an engineer or do any other complex problem-solving tasks for a living you should feel relatively safe for the foreseeable future. Until recently artists were also classified in a low risk of automation group.

In this article, I will tell you how that may have recently changed. Oh, and if I have planted the seed of doubt in you, you can check just how much of a risk your current job is at.

Enters: Nvidia GAN

In a nutshell, GAN (Generative Adversarial Network) is an innovative neural network design allowing among other things for an intelligent content generation. What does this mean? They say a picture is worth a 1000 words. I think it may be worth even more in this case:

 

What you see above is the long-awaited solution for that pesky photobomber that ruined your holiday photo, it can also be a solution for removing your ex from your old photos if that’s what you need! All you need to do is use a brush to erase something from the picture and the neural network will reprocess the images trying to guess/fill in the blank area – usually with something like its surrounding. If you’d like to give this a try you can check the app published by Nvidia here.

Paint me like one of your French girls

Another example will fix an even bigger global problem – lack of drawing skills! I know I am affected. I couldn’t draw a realistic tree if my life would depend on it. It appears that now the only thing that I need to do is scratch a couple of lines and push my creation through the GAN network:


Above you can see a simple drawing turned a photorealistic masterpiece by our new favourite neural network. Why don’t you give it a try at this interactive test portal published by Nvidia. Still doubting GANs artist potential? What if I told you one of GANs has created a portrait that sold for 432 000 dollars on the famous British auction house Christie’s? We can only hope it will spend it wisely.

The Fight of Two Wolves

The next 3 paragraphs are slightly more technical so if you are only here for the brief overview you can skip ahead to the last 2 paragraphs where I talk about potential use cases.

Before I go into more details, I think it’s worth describing the GAN principle in a less technical example: The A in GAN stands for Adversarial, which means we have two adversary parties – Counterfeiters and the Police. Counterfeiters are trying to come up with the best possible methods of making fake money and the Police is constantly working on improving its counterfeit detection methods. The two are in the closed-feedback loop constantly learning from each other. If the Police learn new method of improving their detection methods, counterfeiters will be forced to up their game as well. This competition is exactly what makes GANs so efficient, but I’ll talk more about it in the next paragraph.

Generative Adversarial Networks are not new. They are part of a group of networks called generative networks. In comparison to more classical discriminative models who predict labels given the input data, generative networks do the opposite. GAN requires labels as its input, that’s why when you draw something you have to first select a category for it (grass/bridge/clouds etc.). GAN then takes these labels and passes them to one of its 2 core components, the “Generator”.

Generator is a neural network which will try to create content that can pass for a real thing. What is considered the real thing? Well, that’s what the second core component is for, the “Discriminator”. Discriminator (also a neural network) will evaluate the generator’s work and will determine whether it matches its definition of reality. Reality depends on the data the networks have been trained on. As I am sure you have already guessed the Generator is the Counterfeiter from the above example and the Discriminator plays the role of the Police.

Nvidia’s network is as you can imagine not the first of its kind, there have been networks capable of generating photorealistic images and even music in the past, but they had some drawbacks.

Some of the earlier networks required huge databases of images to support them as they were pretty much-stitching images together rather than generating them. Others did not produce very good results due to architecture flaws. By that I mean the often-used normalisation layer which is [helpful in some ways https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c] but was also responsible for washing/blurring away the semantic information (labels describing the shapes and lines of the input drawing) provided as input for the network. This means that the deeper into the network the less semantic information there would be as each normalisation layer would blur it a little more.

Scotty, I need more power!

Nvidia’s research group has introduced some key improvements in their GAN architecture allowing them to outperform the competition by quite a lot. The key improvement is called the SPADE which stands for SPatially-Adaptive (DE)normalization layer.

In the previous paragraph, I mentioned that one of the key downfalls of previous GAN architectures was the loss of semantic information (labels/descriptors) due to the nature of classical normalization layers. In case you are not familiar with normalization blocks, they are specialized neural network layers often used in image processing (convolutional networks) to speed up learning and reduce overfitting in training. You can read more on normalization here

Unfortunately, we cannot hope of creating an efficient image processing network without the normalization blocks. In most scenarios, they do a lot of good but in GAN’s case they are more of a necessary evil. So, instead of removing normalization Nvidia researchers proposed improved normalization blocks – SPADEs. Its inner working is explained in details in this research paper but in short, it works like a regular (blurry) normalization but it also takes an extra input which in our case is the semantic information data.

Again, the semantic data is the map explaining which shape and colour represent what category of data – it is also our networks input (even if we do not feed it to the input layer – but more on this in a bit). SPADE layers do the regular normalization logic and then they de-normalize data a bit by partially re-applying the external data on it. This way we get the benefit of normalization but do not lose the important semantic information allowing the network to make better “decisions” in training.

Above you can see our SPADE layers used to create the residual blocks that are the key components of our network. Residual blocks are a common technique of composing a couple of convolutional layers together with the addition of taking the first blocks input data and merging it with the last blocks output. This force forward technique allows for deeper networks – even if x layers inside the Residual block will blur out data they are processing, the data will be partially reapplied to the network on the output of the block. You can read more about the ResBlocks here.

So, our network is pretty much just a couple of Residual blocks composed of standard convolutional layers preceded by our improved normalization layers SPADEs. What’s interesting is that we do not feed the input to the first layer as you normally see in other networks. Instead, we use random vectors as input – they serve the role of a “seed” for our generator. The actual input data – the image semantic information is feed to our SPADE layers so it can continuously influence data passing through the network.
You can see the complete code of the SPADE network on the NVlabs GitHub here.

I want to go deeper!

Before you go and create the new state of the art GANs that will challenge Nvidia researchers its best to start with something simple. Below you can see a GAN network designed in Keras for simplicity. Some key caveats you may find useful in any modifications you may want to apply are:

  • You will get better results if you pretrain your Discriminator – its best that it does not start completely stupid (you can test it experimentally)
  • Since both networks learn from each other it is important to experiment with learning rates so that one network doesn’t overpower the other – this would be GANs equivalent of classical overfitting.
  • Hyperparameters tuning is difficult due to among other reasons time it takes to train GANs, so make sure you read up and learn on others’ mistakes.

class GAN():
    def __init__(self):
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.latent_dim = 100
        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss='binary_crossentropy', optimizer=optimizer,
                                   metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise as input and generates imgs
        z = Input(shape=(self.latent_dim,))
        img = self.generator(z)

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated images as input and determines validity
        validity = self.discriminator(img)

        # The combined model  (stacked generator and discriminator)
        # Trains the generator to fool the discriminator
        self.combined = Model(z, validity)
        self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)

    def build_generator(self):
        model = Sequential()
        model.add(Dense(256, input_dim=self.latent_dim))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))
        model.summary()

        noise = Input(shape=(self.latent_dim,))
        img = model(noise)
        return Model(noise, img)

    def build_discriminator(self):
        model = Sequential()
        model.add(Flatten(input_shape=self.img_shape))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(256))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(1, activation='sigmoid'))
        model.summary()
        img = Input(shape=self.img_shape)
        validity = model(img)
        return Model(img, validity)

    def train(self, epochs, batch_size=128, sample_interval=50):
        # Load the dataset
        (X_train, _), (_, _) = mnist.load_data()

        # Rescale -1 to 1
        X_train = X_train / 127.5 - 1.
        X_train = np.expand_dims(X_train, axis=3)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))
        for epoch in range(epochs):
            # ---------------------
            #  Train Discriminator
            # ---------------------
            # Select a random batch of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs = X_train[idx]
            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

            # Generate a batch of new images
            gen_imgs = self.generator.predict(noise)

            # Train the discriminator
            d_loss_real = self.discriminator.train_on_batch(imgs, valid)
            d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------
            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

            # Train the generator (to have the discriminator label samples as valid)
            g_loss = self.combined.train_on_batch(noise, valid)

            # Plot the progress
            print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (
            epoch, d_loss[0], 100 * d_loss[1], g_loss))

            # If at save interval => save generated image samples

            if epoch % sample_interval == 0:
                self.sample_images(epoch)


if __name__ == '__main__':
    gan = GAN()
    gan.train(epochs=30000, batch_size=32, sample_interval=200)

That’s very pretty but how can we use it?

As much as GANs imagination, human imagination has no boundaries so it is up to you what you will use these powerful tools for. However, to jump-start your imagination below are some already implemented use cases

  • Generating alternate photos of missing people
  • Improving face detection
  • Identifying physical anomalies on the skin surface (health care)
  • Generating test data – could be used to train other networks
  • Generating improved or completely new drug formulas (it doesn’t have to operate on pixels)

Since GANs are based on the idea of two networks trying to outsmart each other we can use them to improve on a lot of existing algorithms. In 2016 Google Brain team challenged two networks to invent better cryptography tools trying to outsmart one another in encrypting and decrypting information.

What’s Next?

Can you think of a clever way of getting two competing networks to improve on an existing idea? I hope you do.

Below are some interesting materials that I encourage you to check if you are interested in designing or using GANs in your projects:

I hope you found all this interesting. Keep on exploring and as always feel free to share your thoughts and ideas in the comments below.