Skip to content

Latest commit

 

History

History
74 lines (39 loc) · 3 KB

README.md

File metadata and controls

74 lines (39 loc) · 3 KB

Motivation for Image Captioning

To generate a gramatically correct sentence which can accurately describe the scene of an image, enabling any individual to visualize the image mentally. Instead of simply detecting objects, the network aims to establish a relationship among entities in the image.

Block Diagram

Block Diagram

Image Captioning using Uni-Directional and Bi-Directional LSTM

Image Features are extracted using InceptionV3 model(Pretrained) Captioning Model is trained on the Flickr8k Dataset

Navigate to the 'Flickr' directory in the command prompt: python run.py

The parent folder of this repository should contain the trained caption_model weights.

The Image Captioning Model is deployed as a REST API i.e., the web-app as well as our Flutter Application makes API calls to the server by sending an image and the server responds with a caption

The Web-App displays the Bidirectional as well as Uni directional Approaches side-by-side and also a table for the accuracy of each predicted word

Model Architecture (Uni-Directional and Bi-Directional respectively)

Model Architecture

Test Results (Flask Web-App):

Football players:

Football players

Snowy Scene:

Snowy Scene

Running Dog:

Running Dog

Jumping Dog:

Jumping Dog

2 Running Dogs:

2 Running Dogs

Test Results (Flutter App):

The Caption Model is deployed as a REST API locally on my laptop and the Flutter Application fetches data using it

Dog on Beach:

Dog on Beach

3 Dogs:

3 Dogs

Dog Jumping over Hurdle:

Dog Jumping over Hurdle

Basketball Boy:

Basketball Boy

Evaluation Metric

The BLEU Metric has been used to evaluate the test images. A higher BLEU rating (closer to 1) corresponds to an accurate description

The captions are generated by the model on a Laptop CPU which results in a higher processing time Deploying the model on the cloud can improve performance