Here we are implementing an image captioning network. The network will consist of an encoder and a decoder. The encoder is a convolutional neural network, and the decoder is a recurrent neural network. Producing reasonable textual description of an image is a hard task, however with the use of a CNN and a RNN we can start to generate somewhat plausible descriptions.
Tested and run with python 3.6 and PyTorch 1.0 on ubuntu 18.04