Style Transfer – Styling Images with Convolutional Neural Networks

Creating Beautiful Image Effects

In today’s article, we are going to create remarkable style transfer effects. In order to do so, we will have to get a deeper understanding of how Convolutional Neural Networks and its layers work. By the end of this article, you will be able to create a style transfer application that is able to apply a new style to an image while still preserving its original content.

Boston skyline mixed with Van Gogh’s ‘The Starry Night’

Style Transfer

Before we go to our Style Transfer application, let’s clarify what we are striving to achieve.

Let’s define a style transfer as a process of modifying the style of an image while still preserving its content.

Given an input image and a style image, we can compute an output image with the original content but a new style. It was outlined in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style, which is a great publication, and you should definitely check it out.

Input Image + Style Image -> Output Image (Styled Input)

How does it work?

  1. We take input image and style images and resize them to equal shapes.
  2. We load a pre-trained Convolutional Neural Network (VGG16).
  3. Knowing that we can distinguish layers that are responsible for the style (basic shapes, colors etc.) and the ones responsible for the content (image-specific features), we can separate the layers to independently work on the content and style.
  4. Then we set our task as an optimization problem where we are going to minimize:
  • content loss (distance between the input and output images – we strive to preserve the content)
  • style loss (distance between the style and output images – we strive to apply a new style)
  • total variation loss (regularization – spatial smoothness to denoise the output image)

5. Finally, we set our gradients and optimize with the L-BFGS algorithm.

While above high-level overview may seem confusing, let’s go straight to the code!

“What I cannot create, I do not understand.” – Richard Feynman

Implementation

You can find the full codebase of the Style Transfer project as a Kaggle kernel or as usual on GitHub.

Input

Let’s start with defining our input

which is a San Francisco’s skyline

Style

Then let’s define a style image

which is a Tytus Brzozowski’s vision of Warsaw, Poland.

Data preparation

Next step would be to perform reshaping and mean normalization on both images.

CNN Model

Afterward, with our image arrays ready to go, we can proceed to our CNN model.

I recommend you to check my previous article about the CNN basics where I deeply explain how Convolutional Neural Networks work

Image Classifier – Cats🐱 vs Dogs🐶

and another one that covers Transfer Learning, which we are also going to leverage in this project.

Histopathologic Cancer Detector – Machine Learning in Medicine

In this project, we are going to use a pre-trained VGG16 model which looks as follows.

VGG16 Architecture (source: https://medium.com/@franky07724_57962/using-keras-pre-trained-models-for-feature-extraction-in-image-clustering-a142c6cdf5b1)

Keep in mind that we are not going to use fully connected (blue) and softmax layers (yellow). They act as a classifier which we don’t need here. We are going to use only feature extractors i.e convolutional (black) and max pooling (red) layers.

Let’s take a look at how specific features look at the selected VGG16 layers trained on ImageNet dataset.

VGG16 Features (source: https://stackoverflow.com/questions/48220598/convolution-layer-in-cnn/48280069)

We are not going to visualize every CNN layer here but according to Johnson et al. for a content layer we should select

block2_conv2

and for style layers

[block1_conv2, block2_conv2, block3_conv3, block4_conv3, block5_conv3]

While this combination is proved to be working I recommend you to play with it and experiment with different layers.

Content Loss

Having our CNN model defined, let’s define a content loss function. In order to preserve original content, we are going to minimize the distance between an input image and an output image.

Style Loss

Similarly to the content loss, style loss is also defined as a distance between two images. However, in order to apply a new style, style loss is defined as a distance between a style image and an output image.

Total Variation Loss

Lastly, we are going to define a total variation loss which is going to act as a spatial smoother to regularize image and prevent its denoising.

Optimization – Loss and Gradients

Afterward, having our content loss, style loss, and total variation loss set, we can define our style transfer process as an optimization problem where we are going to minimize our global loss (which is a combination of content, style and total variation losses).

Instead of covering the underlying math here (but I still recommend you to check it in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style), think about it this way:

In each iteration, we are going to create an output image so that the distance (difference) between output and input/style on corresponding feature layers is minimized.

Gradient Descent Visualization (source: https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/)

Results

Ultimately, let’s optimize with the L-BFGS algorithm and visualize the results.

Outputs after 1,2 and 5 iterations
Output after 10 iterations

Let’s see how our input, style and output images look combined.

Pretty impressive, huh?

We can clearly see that while the original content of the input image (San Francisco’s skyline) was preserved, we successfully applied a new style (Tytus Brzozowski’s Warsaw) to the output image.

Other examples

What’s next?

We’ve proved that we can leverage CNNs and its layers as the feature extractors to create remarkable style transfer effects. I encourage you to play with the hyperparameters and the layers configuration to achieve even better effects. Don’t hesitate to share your results!

Don’t forget to check the project’s github page.

gsurma/style_transfer

Questions? Comments? Feel free to leave your feedback in the comments section or contact me directly at https://gsurma.github.io.


Style Transfer – Styling Images with Convolutional Neural Networks was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published. Required fields are marked *