Traditional Culture Encyclopedia - Photography and portraiture - A summary of style transfer papers

A summary of style transfer papers

This paper is a summary and other related papers based on the transfer of neural style.

The representative article is the pioneering work of gaty, and the neural algorithm of artistic style. This is a slow neural network method to realize style conversion.

One of the key points: the common deep learning problem is to learn the weight of the network by using the input of training samples. In this paper, the trained weights are used to obtain the input that meets the output requirements.

Input: An input image initialized with Gaussian noise.

After many iterations, the input response is an image with a specific style and content. So this method learns the pixel value, not the weight.

Secondly, a new loss function is introduced.

Content loss: pixel difference between input and output images. Do it with mse.

Style loss: calculated by gram matrix. The final expression is also similar to mse.

The method based on model iteration is a fast style conversion method.

The representative works are real-time style conversion and perceptual loss of super resolution.

One of the key points: This paper introduces a large network composed of two networks. The first half of the network is called mirror transfer, and the second half is called loss network. Image transmission network, the weight is updated; The weight of loss network is not updated, and it is a pre-trained vgg network, which is used to extract high-dimensional features. In fact, it is to input the original image, then generate the output with the style of fs through the image transmission network, and then optimize the error by using the loss network to achieve the ideal effect.

The second point: put forward a new loss function.

Loss of feature reconstruction: We don't use the same loss function as the artistic neural algorithm here, but use vgg to extract features as a measure of content loss. The purpose of this is to say that the original one-to-one error function between pixels is inaccurate in many cases. For example, in two pictures, only one pixel is shifted. In terms of content loss, this result is very different, but in fact, the two pictures are not very different to the human eye. Therefore, it is meaningful to use the high-level features extracted by vgg as content loss. However, this method also has a disadvantage, that is, what you learn is false and not very real.

Loss of style reconstruction: here, as in the previous article, gram matrix is used as the style feature.

(1) Take DSLR-quality photos on mobile devices with deep convolution networks.

The input of this paper is mobile phone photos (such as iphone, BlackBerry, etc. ), and the output is SLR photos.

One of the key points: using gan network

The second point: a new loss function has been added.

Color loss: Before calculating the color loss, the image should be Gaussian blurred. Gaussian blur processing is used because Gaussian blur can remove high-frequency information and make it easier to compare colors. For small error matching, color loss has a high tolerance. Therefore, we can learn the color distribution similar to the target picture.

Texture loss: gan network is used to judge the accuracy of the network as the measure of texture error.

Content loss: The Euclidean distance of high-dimensional features abstracted by vgg is regarded as content loss. It is different from the gram matrix used above.

Total variation loss: The purpose is to obtain smoother output.

(2) Style transfer of depth photos.

Previous papers on style transfer are all based on a photo and another art work, which makes the generated picture look like a painting, while the content map and style map of this paper are both photographic works.

One of the key points: the loss function has been modified.

Content loss: feature matrix is adopted, which is the same as gatys.

Realistic regularization constraint based on local affine transformation in color space.

Enhanced style loss based on semantic segmentation: an enhanced content loss function based on semantic segmentation. As the augmented channel of the input image, the segmented mask is input into the neural network together, thus ensuring that only the content we are interested in is processed.

Previous article:What are the similarities and differences between static composition and dynamic composition?
Next article:Are there any classic or interesting parking lot designs?