Image Caption Generator

Introduction

The basic working of the project is that the features are extracted from the images using pre-trained VGG16 model and then fed to the LSTM model along with the captions to train. The trained model is then capable of generating captions for any images that are fed to it.

The Image Caption Generator project aims to develop software that can generate descriptive captions for images using neural networks language models. It involves dual techniques from computer vision to understand the content of the image and a language model from the field of natural language processing to turn the understanding of the image into words in the right order.

Functionality

This web application performs the following functions:

Image Captioning: Recognising different types of objects in an image and creating a meaningful sentence that describes that image to visually impaired persons.
Text to Speech Conversion: Converting the generated sentence into audio form for easy access to the visually impaired.

Dataset

The dataset used here is the FLICKR 8K which consists of around 8091 images along with 5 captions for each images. If you have a powerful system with more than 16 GB RAM and a graphic card with more than 4 GB of memory, you can try to take FLICKR 30K which has around 30,000 images with captions.

Applications

Image captioning has various applications such as:

Recommendations in editing applications
Usage in virtual assistants
Image indexing
For visually impaired persons
Social media
Several other natural language processing applications.

At present, images are annotated with human intervention, and it becomes a nearly impossible task for huge commercial databases. The image database is given as input to a deep neural network (Convolutional Neural Network (CNN)) encoder for generating “thought vector” which extracts the features and nuances out of our image and RNN (Recurrent Neural Network) decoder is used to translate the features and objects given by our image to obtain a sequential, meaningful description of the image.

This project can act as a vision for visually impaired people, as it can identify nearby objects through the camera and give output in audio form. The application can provide a highly interactive platform for specially-abled people.

Technologies Used

Python
TensorFlow
Keras
Flask

Installation

To install and run this project locally, follow these steps:

Clone this repository
Navigate to the project directory
Install dependencies by running pip install -r requirements.txt
Run the application by running python app.py
Open a web browser and navigate to http://localhost:5000

Train Data Samples

Results

Acknowledgments

The authors would like to thank the creators of TensorFlow, Keras, and Flask for their contributions to the open-source community.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Results		Results
images		images
logs		logs
model_data		model_data
saved_models		saved_models
test_data		test_data
train_val_data		train_val_data
utils		utils
.DS_Store		.DS_Store
Automated image caption generator using deep learning.pptx		Automated image caption generator using deep learning.pptx
Image Captioning 8k.ipynb		Image Captioning 8k.ipynb
Image caption train.ipynb		Image caption train.ipynb
README.md		README.md
Research Paper Final (2) (1).pdf		Research Paper Final (2) (1).pdf
config.py		config.py
final_project_report_image_caption.pdf		final_project_report_image_caption.pdf
model.json		model.json
test.py		test.py
train_val.py		train_val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Caption Generator

Introduction

Functionality

Dataset

Applications

Technologies Used

Installation

Train Data Samples

Results

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

jksingh07/Image-caption-generator

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generator

Introduction

Functionality

Dataset

Applications

Technologies Used

Installation

Train Data Samples

Results

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages