Skip to content

GauthamSks/Neural-StyleTransfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Neural Style Transfer

Neural style transfer is a Deep Learning computer vision technique for transferring the artistic style of the source image to the target image while preserving the content of the target image to create new artworks. This technique was pioneered by Leon A. Gatys in his paper A Neural Algorithm of Artistic Style.

The key finding of this paper is that the representations of content and style in the Convolutional Neural Network are well separable. That is, we can manipulate both representations independently to produce new, perceptually meaningful images. This is based on the fact that CNN's extract features hierarchically. The initial layers of the CNN might extract simple features such as edges,corners, etc(ie detailed pixel value) while the deeper layers extract more complex features such as shapes, regions of the image, etc. Based on this we can say that the style of the image can be extracted from the initial layers while the content can be extracted from deeper layers. In this context, style essentially means textures, colors, and visual patterns in the image, at various spatial scales; and the content is the higher-level macrostructure of the image.

The key notion behind all deep-learning algorithms is to define a loss function to specify what you want to achieve, and you minimize this loss. Here in Style Transfer, our goal is to conserve the content of the original image while adopting the style of the reference image and we do this by using two loss functions together, the content loss and the style loss.

In the above figure α and β are the weighting factors for content and style reconstruction, respectively.

The content loss is the L2 norm between the activations of a deeper layer in a pre-trained convnet, computed over the target image, and the activations of the same layer computed over the generated image as shown in the figure below. Here p and x is the original image and the image that is generated, and Pl and Fl their respective feature representations in layer l and Fijl is the activation of the ith filter at position j in layer l.

The content loss only uses a single deep layer, but the style loss as defined by Gatys uses multiple layers of a convnet. He used the Gram matrix of a layer’s activations that is the inner product of the feature maps of a given layer. This inner product can be understood as representing a map of the correlations between the layer’s features. By including the feature correlations of multiple layers, we obtain a stationary, multi-scale representation of the input image, which captures its texture information but not the global arrangement. Now the style loss is computed as the L2 norm between the Gram matrices from the original image and the Gram matrices of the image to be generated. In the figure shown below a and x be the original image and the image that is generated, and Al and Gl their respective style representation in layer l. The contribution of layer l to the total loss is given below by El and the total style loss by Lstyle. Here Nl is the number of feature maps each of size Ml for the given layer l and wl are weighting factors of the contribution of each layer to the total loss.

The original paper was implemented using VGG-19 pretrained on imagenet here I shall implement the same using PyTorch. VGG-19 model architecture is shown below.

Credits

About

Computer Vision

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published