Image Captioning with Neural Networks is a deep learning project that combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to generate captions for images automatically. This implementation utilizes a pre-trained ResNet model for image feature extraction and an LSTM network for generating textual descriptions of the images.
- Utilizes a pre-trained ResNet-18 model for efficient image feature extraction.
- Employs an LSTM network for generating descriptive captions based on image features.
- Supports training with and without fine-tuning of the ResNet model.
- Includes functionality for both training and testing the model with a custom dataset.
- Visualizes training loss and sample predictions to assess model performance.
- Clone the repository from GitHub.
- Navigate to the project directory.
- Install the required dependencies listed in the
requirements.txt
file.
The model is trained and tested on the Flickr8k dataset, which comprises 8,000 images each paired with five different captions. For the purpose of this project, the dataset is pre-processed to align with the model's requirements.
Training the model involves executing the training script, which will start the training process and save the model weights periodically.
After training, the model's performance can be evaluated by executing the testing script, which generates captions for the images in the test dataset.
The model's performance can be evaluated based on the captions generated for the test images. A qualitative assessment involves comparing the predicted captions against the ground truth captions.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the creators of the Flickr8k dataset for providing the resources necessary for training and testing the model.
- PyTorch documentation for providing comprehensive guides and tutorials.
@misc{MJImageCaptioning2023, author = {Mohammad Javad (MJ) Ahmadi}, title = {Image Captioning}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/MJAHMADEE/Image_Captioning}} }
For more information, please refer to the official repository.