This project is a Context-Based Image Captioning website that leverages computer vision and natural language processing techniques to generate context-aware captions for images. The captions are enriched by user-provided contextual inputs, making them more meaningful and specific.
Upload images and provide contextual text inputs. Real-time caption generation using advanced deep learning models.
Four models were evaluated for feature extraction: Xception, ResNet, VGG16, and EfficientNet. The Xception model performed the best, achieving the highest BLEU score (~60%).
Allows users to include additional contextual information to enhance the captions generated for uploaded images.
The website consists of the following components:
- Brief introduction to the project.
- Option to upload an image.
- Upload an image and provide additional contextual text.
- Click on "Generate Caption" to receive a caption based on the image and context.
Comparison of the four models used in the project, highlighting their performance metrics and visual examples.
Further optimization of the model to improve BLEU scores.
Expand the system to support multiple languages for caption generation.
Explore additional datasets for more diverse captioning capabilities.