This repository contains the implementation of an Image Caption Generator, leveraging a combination of Convolutional Neural Networks (CNNs) for image feature extraction and Long Short-Term Memory (LSTM) networks for generating descriptive captions.
- Caption 1:
- startseq baby girl in an orange dress gets wet as she stands next to water sprinkler endseq
- Caption 2:
- startseq blonde toddler wearing an orange dress is wet and standing beside sprinkler in yard endseq
- Caption 3:
- startseq child in dress is looking at sprinkler endseq
- Caption 4:
- startseq little girl in an orange dress is running through the sprinkler in the yard endseq
- Caption 5:
- startseq "on wet grass little blond girl in orange dress plays in sprinkler." endseq
- Generated Caption:
- startseq two girls in orange clothes are playing with sprinkler endseq
- Data Loading and Preprocessing: Utilizes the Flickr8k dataset, encompassing image data and corresponding captions.
- Feature Extraction: Employs a pre-trained VGG16 model to extract features from images.
- Caption Preprocessing: Cleans and preprocesses captions for effective training.
- Tokenization: Implements a tokenizer to handle text data and create a vocabulary.
- Model Architecture: Structured combination of image and textual features for caption generation.
- Training: Model training through a data generator for efficient processing of large datasets.
- Evaluation: Utilizes BLEU scores for evaluating the quality of generated captions.
- Caption Generation: Functionality to generate captions for new images using the trained model.
- Data Setup: Ensure the Flickr8k dataset is available and properly formatted.
- Model Training: Execute the provided Jupyter notebook to train the image caption generator model.
- Evaluation: Evaluate the model's performance using BLEU scores on a test set.
- Caption Generation: Use the trained model to generate captions for new images.
- Python 3.x
- Pytorch
- Keras
- NumPy
- NLTK
[1] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. "Show and Tell: A Neural Image Caption Generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. Link to Paper
This project is part of a design project of Vth Semester for IIIT Vadodara - International Campus Diu. Feel free to explore, contribute, and adapt the code for your own image captioning projects!