- Image Captioning with DarkNetLM and DarkNetVG2
- Data Lake for Image Captioning model
- Mobile Application
This project focuses on image captioning, utilizing two primary models: DarkNetLM and DarkNetVG2. Both models leverage the CSP DarkNet53 architecture as the backbone of YOLOv8 for feature extraction from images, but they differ in their approaches to generating captions.
- Architecture: DarkNetLM employs the CSP DarkNet53 architecture within YOLOv8 for image feature extraction.
- Components:
- Bert-Tokenizer: Utilized for tokenizing input text, allowing the model to understand and process natural language effectively.
- LSTM: Long Short-Term Memory networks are used to generate sequential captions based on the extracted image features.
- Architecture: Similar to DarkNetLM, DarkNetVG2 also uses the CSP DarkNet53 architecture in YOLOv8 for image feature extraction.
- Components:
- GPT-2: This model employs the GPT-2 architecture for generating captions, leveraging its powerful language generation capabilities to produce coherent and contextually relevant descriptions.
To set up the project, clone the repository and install the required dependencies:
git clone https://github.com/yourusername/Image-Captioning.git
cd Image-Captioning
pip install -r requirements.txt
To run the models, use the following commands:
python Test_model/test-model-bert-lstm-8ep.ipynb
python Test_model/test-model-gpt2-8ep.ipynb --> update continue...
The models are evaluated based on their ability to generate accurate and descriptive captions for a variety of images. Performance metrics and sample outputs will be provided in the results section of the documentation.
Contributions are welcome! Please submit a pull request or open an issue for any suggestions or improvements.
This project is licensed under the MIT License. See the LICENSE file for more details.
- YOLOv8 for its powerful object detection capabilities.
- The creators of the CSP DarkNet53 architecture.
- The developers of BERT and GPT-2 for their contributions to natural language processing.
This system involved the development and implementation of a Data Lake architecture to support an AI model capable of generating image captions. The architecture was designed to efficiently ingest, process, and centralized store large volumes of image and text data.
You can view more details in this repo. You also can view demo videos
This mobile application uses the camera to capture images and sends them to a microservice for image captioning. Powered by a machine learning model, the app generates descriptive captions for each photo, providing visually impaired users with a detailed understanding of their surroundings through auditory feedback.
View more detailes at this repo. You also can view demo video