Image Captioning with Transformers: Transforming Visual Content into Audio for the Visually Impaired

Sample Output

Sample.Output.mp4

Project Overview

The "Image Captioning with Transformers" project was initially undertaken as a team assignment for our Modern Analytics course, focusing on the integration of Convolutional Neural Networks (CNNs) and transformers for image captioning. During this phase, we primarily used CNNs for image feature extraction and transformers for caption generation. However, over the break, I took the initiative to further refine and enhance the project's accuracy, to improve the overall performance of the image captioning system.

Technologies and Tools Used

Programming Language: Python
Deep Learning Framework: PyTorch
Convolutional Neural Network (CNN): ResNet

Objectives

The primary objective of this project is to contribute to the independence of visually impaired individuals by transforming visual content into audio descriptions. While this project represents an initial step, the overarching goal is to provide a meaningful impact on the lives of the visually impaired.

Learning Outcomes

Positional Embedding: Understanding and implementation of positional embedding techniques in the context of image captioning.
Natural Language Processing (NLP): Exploration and application of NLP techniques for generating human-like captions from visual content.
Transformers and Autoencoders: Proficiency in working with transformers and gaining insights into their application in conjunction with autoencoders.

Acknowledgements

This project is a result of dedicated coursework in Modern Analytics, and the development was guided by a commitment to improving accessibility and inclusivity for individuals with visual impairments.

Future Enhancements

While this project represents a significant achievement, future enhancements could include refining the model, expanding the dataset for improved generalization, and exploring additional technologies to further enhance the user experience.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Image Captioning with Transformers: Transforming Visual Content into Audio for the Visually Impaired

Sample Output

Project Overview

Technologies and Tools Used

Objectives

Learning Outcomes

Acknowledgements

Future Enhancements

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Image Captioning with Transformers: Transforming Visual Content into Audio for the Visually Impaired

Sample Output

Project Overview

Technologies and Tools Used

Objectives

Learning Outcomes

Acknowledgements

Future Enhancements

License