Author: Tin Trung Nguyen
This project builds a deep learning pipeline to classify animal images into 10 categories using convolutional neural networks (CNNs) and transfer learning.
The goal is to explore the full machine learning workflow, including:
- Data exploration and preprocessing
- Model development (baseline CNN and pretrained models)
- Training and evaluation
- Model interpretation using Grad-CAM
The dataset consists of approximately 26,000+ images across 10 animal classes:
- dog
- cat
- horse
- spider
- butterfly
- chicken
- sheep
- cow
- squirrel
- elephant
Images are collected from real-world sources and include variations which makes the dataset suitable for testing model robustness.
Dataset link on Kaggle: Animals-10
The project follows a structured machine learning workflow:
- Organize dataset structure
- Verify class labels and image counts
- Analyze class distribution
- Inspect image sizes
- Detect corrupted or small images
- Visualize sample images
- Train / validation / test split
- Image resizing (224 × 224)
- Data augmentation (flip, rotation)
- Normalization using ImageNet statistics
- PyTorch Dataset and DataLoader creation
- Baseline CNN trained from scratch
- Transfer learning using pretrained ResNet
- Training and validation loops
- Performance tracking
- Generating predictions on unseen test data
- Computing a confusion matrix to visualize class-wise performance
- Producing a classification report (precision, recall, F1-score)
- Identifying and visualizing misclassified examples
- Extracting feature maps from the final convolutional layer
- Computing gradients of the predicted class
- Generating heatmaps highlighting important regions
- Overlaying heatmaps on original images for interpretation
The baseline CNN achieved moderate performance, reaching approximately 70% validation accuracy after training. While the model was able to learn meaningful features, its performance plateaued due to the limited capacity of a simple architecture.
In contrast, the ResNet model demonstrated significantly stronger performance, achieving approximately 95% validation accuracy. The model converged rapidly within the first few epochs, highlighting the effectiveness of transfer learning for image classification tasks.
Most classes are classified accurately, including:
- dog
- spider
- chicken
- horse
These errors are expected due to similarities in shape, texture, and visual context.
Analysis of misclassified images reveals that errors are often caused by:
- Cluttered or complex backgrounds
- Low image quality or lighting conditions
- Small or partially visible objects
- Visual similarity between animal classes
This suggests that the model occasionally relies on contextual or background cues in addition to object features.
Grad-CAM visualizations show that the model generally focuses on relevant regions of the image when making predictions.
In some cases, attention is partially directed toward background regions, indicating that contextual information may influence predictions.
- torch
- torchvision
- numpy
- pandas
- matplotlib
- seaborn
- scikit-learn
- pillow
- opencv-python
- jupyter
animals-classification/
├── archive/
│ └── raw-img/
│
├── notebooks/
│ ├── data/
│ │ ├── train/
│ │ ├── val/
│ │ └── test/
│ │
│ ├── models/
│ │ ├── simple_cnn.pth
│ │ └── resnet.pth
│ │
│ ├── dataset_setup.ipynb
│ ├── exploratory_data_analysis.ipynb
│ ├── data_preprocessing.ipynb
│ ├── model_training.ipynb
│ ├── model_evaluation.ipynb
│ └── model_explainability.ipynb
│
├── utils/
│ └── dataset.py
│
└── README.md