Skip to content

This project focuses on image captioning by creating two primary models: DarkNetLM and DarkNetVG2. Both models leverage the CSP DarkNet53 architecture as the backbone of YOLOv8 for feature extraction from images. Combining with Transformers or LSTM to generating captions.

Notifications You must be signed in to change notification settings

Narius2030/IMCP-Support-Blinders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning with DarkNetLM and DarkNetVG2

Table of Contents

Overview

This project focuses on image captioning, utilizing two primary models: DarkNetLM and DarkNetVG2. Both models leverage the CSP DarkNet53 architecture as the backbone of YOLOv8 for feature extraction from images, but they differ in their approaches to generating captions.

Models

DarkNetLM

  • Architecture: DarkNetLM employs the CSP DarkNet53 architecture within YOLOv8 for image feature extraction.
  • Components:
    • Bert-Tokenizer: Utilized for tokenizing input text, allowing the model to understand and process natural language effectively.
    • LSTM: Long Short-Term Memory networks are used to generate sequential captions based on the extracted image features.

DarkNetVG2

  • Architecture: Similar to DarkNetLM, DarkNetVG2 also uses the CSP DarkNet53 architecture in YOLOv8 for image feature extraction.
  • Components:
    • GPT-2: This model employs the GPT-2 architecture for generating captions, leveraging its powerful language generation capabilities to produce coherent and contextually relevant descriptions.

Installation

To set up the project, clone the repository and install the required dependencies:

git clone https://github.com/yourusername/Image-Captioning.git
cd Image-Captioning
pip install -r requirements.txt

Usage

To run the models, use the following commands:

DarkNetLM

python Test_model/test-model-bert-lstm-8ep.ipynb

DarkNetVG2

python Test_model/test-model-gpt2-8ep.ipynb --> update continue...

Results

The models are evaluated based on their ability to generate accurate and descriptive captions for a variety of images. Performance metrics and sample outputs will be provided in the results section of the documentation.

Contributing

Contributions are welcome! Please submit a pull request or open an issue for any suggestions or improvements.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgments

  • YOLOv8 for its powerful object detection capabilities.
  • The creators of the CSP DarkNet53 architecture.
  • The developers of BERT and GPT-2 for their contributions to natural language processing.

Data Lake for Image Captioning model

This system involved the development and implementation of a Data Lake architecture to support an AI model capable of generating image captions. The architecture was designed to efficiently ingest, process, and centralized store large volumes of image and text data.

You can view more details in this repo. You also can view demo videos

image

Mobile Application

This mobile application uses the camera to capture images and sends them to a microservice for image captioning. Powered by a machine learning model, the app generates descriptive captions for each photo, providing visually impaired users with a detailed understanding of their surroundings through auditory feedback.

View more detailes at this repo. You also can view demo video

image

About

This project focuses on image captioning by creating two primary models: DarkNetLM and DarkNetVG2. Both models leverage the CSP DarkNet53 architecture as the backbone of YOLOv8 for feature extraction from images. Combining with Transformers or LSTM to generating captions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published