Named Entity Recognition + Image Classification

Overview

This project is a full machine learning pipeline that combines Natural Language Processing (NLP) and Computer Vision (CV) to verify whether a given text description correctly identifies the animal in an image. The system extracts animal names from a sentence using a Named Entity Recognition (NER) model and classifies animals in images using a deep learning model. Finally, it compares the results and determines whether the statement is true or false.

Dataset

I built the dataset from scratch, scraping images from Pinterest and Google Images, manually filtering noise and irrelevant content. The dataset consists of 10 animal classes, with approximately 900 images per class.

Models

For NER, I fine-tuned the spaCy en_core_web_sm model, adding a custom entity type ANIMALS and training it on a curated dataset of animal-related sentences. For image classification, I used a VGG16 model as a backbone, adding two dense layers (1024 & 512 neurons) and a final layer with 10 neurons (softmax activation).

Project Structure

The core logic is structured as follows:

models/
- ner/: Contains train_ner.py, infer_ner.py, and the trained model custom_ner_model/.
- classifier/: Contains train_classifier.py, infer_classifier.py, and the best model best_classifier_model.keras.
pipeline.py: The main script that ties everything together.
web-scraper.py: The script used to collect images for training.

Running the Pipeline

To check if a statement matches the image, run the following command:

python pipeline.py 'I see a tiger in this picture.' 'eval_dataset/tiger_0.jpeg'

Example Output:

Text: I see a tiger in this picture.
Extracted animals: tiger
Image classification: tiger
✅ The statement is TRUE!

If the predicted class does not match the extracted entity, the output will indicate FALSE.

Additional Notes

The NER model is transformer-based but not a large language model (LLM).
The classifier is based on a fine-tuned VGG16 model.
The pipeline is fully automated and expects two inputs: text and image path.

This project showcases the integration of NLP and CV to solve a multi-modal problem. It can be extended with more sophisticated models and additional robustness improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
eval_dataset		eval_dataset
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
README.md		README.md
pipeline.py		pipeline.py
requirements.txt		requirements.txt
web-scraper.py		web-scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition + Image Classification

Overview

Dataset

Models

Project Structure

Running the Pipeline

Example Output:

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

MaxPodloobniy/ImageClassification-NER

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition + Image Classification

Overview

Dataset

Models

Project Structure

Running the Pipeline

Example Output:

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages