Caption It! - Harnessing the Power of CNN/ResNet Models for Image Description

Introduction

"Caption It!" is a deep learning project focusing on the automation of image captioning. Utilizing the Flickr 8k Dataset and pre-trained CNN/ResNet models, this project compares two approaches: CNN+LSTM and ResNet+GRU, evaluating their performance using BLEU scores.

Agenda

Problem Statement
Technical Approach
Dataset Analysis
Exploratory Data Analysis
Deep Learning Approaches: VGG16+LSTM and RESNET50+GRU
Performance Evaluation
Conclusion

Problem Statement

Our goal was to develop a model that automatically generates captions for images using advanced deep learning techniques.

Technical Approach

Import libraries and modules
Load and preprocess dataset
Perform feature extraction and caption tokenization
Model building, training, and evaluation

Dataset Source

For further exploration and model generalization, the Flickr 8k Images with Captions dataset is also available on Kaggle and can be accessed at: Flickr 8k Images with Captions on Kaggle.

Dataset Analysis

The Flickr 8k Dataset, comprising 8000 images each with 5 captions, was used. This dataset offers a diverse range of images and high-quality captions, ideal for training image captioning models. Below are some visuals from the dataset!

Caption Length Distribution

Top 50 Words in the Dataset

Model Training Visualization: Sample Image Captions

Deep Learning Approaches

VGG16 & LSTM

VGG16: A pre-trained CNN for image classification.
LSTM: A recurrent neural network excellent for capturing temporal dependencies.

Output of trained VGG16 & LSTM model

ResNet50 & GRU

ResNet50: A deep residual network for image recognition.
GRU: Efficient at capturing temporal relationships in sequence modeling.

Output of trained ResNet50 & GRU model

Performance Evaluation

Model performance was evaluated using BLEU scores.

BLUE Scores comparision

The VGG16+LSTM model exhibited higher BLEU scores, indicating its effectiveness in generating more accurate captions.

Conclusion

The project reveals the impact of different pre-trained image feature extraction models and sequence models on the quality of generated captions. It opens pathways for further improvements in the field of automated image captioning.

Dependencies

This project requires the following libraries:

TensorFlow
Keras
NumPy
Pandas
Matplotlib
PIL
NLTK

Install these using pip or conda as shown in the provided Python notebooks.

Instructions to Run the Project

To run this project, follow these steps:

Clone the repository to your local machine.
Ensure you have Jupyter Notebook installed.
Open and run the eda.ipynb notebook for exploratory data analysis.
Proceed with vgg16_lstm.ipynb for the VGG16+LSTM model training and evaluation.
Finally, execute resnet_gru.ipynb for the ResNet50+GRU model training and evaluation.
Compare the BLEU scores as outputted by the notebooks to evaluate the models.
Refer PPT for flow of the project

Please refer to each notebook for detailed instructions on the steps involved in the respective processes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Image_Caption.pptx		Image_Caption.pptx
README.md		README.md
eda.ipynb		eda.ipynb
resnet_gru.ipynb		resnet_gru.ipynb
vgg16_lstm.ipynb		vgg16_lstm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caption It! - Harnessing the Power of CNN/ResNet Models for Image Description

Introduction

Agenda

Problem Statement

Technical Approach

Dataset Source

Dataset Analysis

Caption Length Distribution

Top 50 Words in the Dataset

Model Training Visualization: Sample Image Captions

Deep Learning Approaches

VGG16 & LSTM

ResNet50 & GRU

Performance Evaluation

Conclusion

Dependencies

Instructions to Run the Project

About

Releases

Packages

Languages

tanzealist/AutoImageCaption-CNNvsResNet

Folders and files

Latest commit

History

Repository files navigation

Caption It! - Harnessing the Power of CNN/ResNet Models for Image Description

Introduction

Agenda

Problem Statement

Technical Approach

Dataset Source

Dataset Analysis

Caption Length Distribution

Top 50 Words in the Dataset

Model Training Visualization: Sample Image Captions

Deep Learning Approaches

VGG16 & LSTM

ResNet50 & GRU

Performance Evaluation

Conclusion

Dependencies

Instructions to Run the Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages