DS-4002-Project-3

Section 1: Software and Platform

For this project we used python and specific packages listed below. The platforms used were Windows and Mac and the types of software being used were Jupyter notebook / Google Colab and the kernels: Python 3.9 .

Required imports and libraries:

import tensorflow as tf
from google.colab import drive
import os
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import GlobalAveragePooling2D, Dropout, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import json
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt

Section 2: Map of Documentation

DS-4002-Project-3/
│
├── README.md                      ← Main orientation + reproduction guide
├── LICENSE.md                     ← MIT License
│
├── DATA/
│   ├── raw/                       ← Raw FruitVision dataset (NOT stored on GitHub)
│   │   └── Original Image/
│   │        └── Fruits Original/
│   │             ├── Apple/
│   │             │    ├── Fresh/
│   │             │    ├── Rotten/
│   │             │    └── Formalin-mixed/       (ignored)
│   │             ├── Banana/
│   │             ├── Grape/
│   │             ├── Mango/
│   │             └── Orange/
│   │
│   ├── dataset_splits/            ← Output of Script 01
│   │   ├── train/
│   │   │    ├── fresh/
│   │   │    └── rotten/
│   │   ├── val/
│   │   │    ├── fresh/
│   │   │    └── rotten/
│   │   └── test/
│   │        ├── fresh/
│   │        └── rotten/
│   │
│   └── metadata_README.md         ← Data dictionary, provenance, ethics, summary, plots
│
├── SCRIPTS/
│   ├── 01_train_val_test_split.py      ← Local script to generate splits from raw data
│   └── 02_train_resnet50_colab.ipynb   ← Colab notebook for training & evaluation
│
├── OUTPUT/
│   ├── best_model.h5                   ← Best Keras model saved during training
│   ├── training_history.json           ← Loss/accuracy per epoch
│   ├── classification_report.json      ← Precision/recall/F1 scores
│   ├── confusion_matrix.png            ← Confusion matrix figure

Section 3: Instructions for Reproducing

3.0 Assumptions

You are starting from the repository root (the same folder that contains README.md).
Python ≥ 3.9 is installed.
You have access to Google Colab for training the model.
Commands are shown for macOS/Linux.
- On Windows PowerShell, replace source .venv/bin/activate with:
  - . .venv/Scripts/Activate.ps1

Project-specific assumptions (update these as needed):

Raw data files are stored in DATA/raw/
Processed/clean data will be written to DATA/processed/
Model outputs (metrics, figures, and artifacts) will be written to OUTPUT/
All scripts used below are stored in the SCRIPTS/ directory

3.1 Set up the Environment

Open a terminal and navigate to the project folder (the repo root):
```
cd DS-4002-Project-3/
```
(Recommended) Create and activate a virtual environment
```
python -m venv .venv
source .venv/bin/activate
```

Windows option:

python -m venv .venv
. .venv/Scripts/Activate.ps1

install all required independencies

pip install -r requirements.txt

3.2 Obtain and Place the Data

Download the raw dataset from the source described in Section 2: Map of Documentation
Save the raw files into: DATA/raw/
Verify that the paths in the scripts under SCRIPTS/ match the actual file names in DATA/raw/. Update them if necessary.

3.3 Preprocessing the Raw Data

From the repo root, run:

python SCRIPTS/01_train_val_test_split.py

which results to

DATA/dataset_splits/train/
DATA/dataset_splits/val/
DATA/dataset_splits/test/

Each folder contains

Fresh/
Rotten/

3.4 Upload Dataset Splits to Google Drive

To train on Google Colab, upload the entire folder:

DATA/dataset_splits/

to

My Drive/Colab Notebooks/DS4002-Project3/dataset_splits/

Also create (or allow the notebook to create):

My Drive/Colab Notebooks/DS4002-Project3/OUTPUT/

3.5 Train the Model in Google Colab

Upload to Google Colab

SCRIPTS/02_train_resnet50_colab.ipynb

Set runtime to GPU: Runtime → Change runtime type → GPU
Run all cells:
Mount Google Drive
Load dataset from dataset_splits
Build ResNet-50 (ImageNet pretrained)
Train model with early stopping
Evaluate on the test set
Save outputs into the Drive OUTPUT/ folder

3.6 Move Outputs Back into This Repository

After training completes, download the contents of:

My Drive/Colab Notebooks/DS4002-Project3/OUTPUT/

and place the files into

OUTPUT/

You should have

OUTPUT/
    best_model.h5
    training_history.json
    classification_report.json
    confusion_matrix.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DS-4002-Project-3

Section 1: Software and Platform

Required imports and libraries:

Section 2: Map of Documentation

Section 3: Instructions for Reproducing

3.0 Assumptions

3.1 Set up the Environment

3.2 Obtain and Place the Data

3.3 Preprocessing the Raw Data

3.4 Upload Dataset Splits to Google Drive

3.5 Train the Model in Google Colab

3.6 Move Outputs Back into This Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
DATA		DATA
OUTPUT		OUTPUT
SCRIPTS		SCRIPTS
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DS-4002-Project-3

Section 1: Software and Platform

Required imports and libraries:

Section 2: Map of Documentation

Section 3: Instructions for Reproducing

3.0 Assumptions

3.1 Set up the Environment

3.2 Obtain and Place the Data

3.3 Preprocessing the Raw Data

3.4 Upload Dataset Splits to Google Drive

3.5 Train the Model in Google Colab

3.6 Move Outputs Back into This Repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages