Motivation for Image Captioning

To generate a gramatically correct sentence which can accurately describe the scene of an image, enabling any individual to visualize the image mentally. Instead of simply detecting objects, the network aims to establish a relationship among entities in the image.

Block Diagram

Image Captioning using Uni-Directional and Bi-Directional LSTM

Image Features are extracted using InceptionV3 model(Pretrained) Captioning Model is trained on the Flickr8k Dataset

Navigate to the 'Flickr' directory in the command prompt: python run.py

The parent folder of this repository should contain the trained caption_model weights.

The Image Captioning Model is deployed as a REST API i.e., the web-app as well as our Flutter Application makes API calls to the server by sending an image and the server responds with a caption

The Web-App displays the Bidirectional as well as Uni directional Approaches side-by-side and also a table for the accuracy of each predicted word

Model Architecture (Uni-Directional and Bi-Directional respectively)

Test Results (Flask Web-App):

Football players:

Snowy Scene:

Running Dog:

Jumping Dog:

2 Running Dogs:

Test Results (Flutter App):

The Caption Model is deployed as a REST API locally on my laptop and the Flutter Application fetches data using it

Dog on Beach:

3 Dogs:

Dog Jumping over Hurdle:

Basketball Boy:

Evaluation Metric

The BLEU Metric has been used to evaluate the test images. A higher BLEU rating (closer to 1) corresponds to an accurate description

The captions are generated by the model on a Laptop CPU which results in a higher processing time Deploying the model on the cloud can improve performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Motivation for Image Captioning

Block Diagram

Image Captioning using Uni-Directional and Bi-Directional LSTM

Model Architecture (Uni-Directional and Bi-Directional respectively)

Test Results (Flask Web-App):

Test Results (Flutter App):

Evaluation Metric

Files

README.md

Latest commit

History

README.md

File metadata and controls

Motivation for Image Captioning

Block Diagram

Image Captioning using Uni-Directional and Bi-Directional LSTM

Model Architecture (Uni-Directional and Bi-Directional respectively)

Test Results (Flask Web-App):

Test Results (Flutter App):

Evaluation Metric