This project implements a complete machine learning pipeline to classify iris flower species using the classic Iris dataset. The workflow is built and tested on Amazon SageMaker Studio Lab, simulating an end-to-end ML project lifecycle from preprocessing to evaluation.
- Objective: Predict the iris species (
Setosa,Versicolor,Virginica) based on sepal and petal dimensions. - Algorithm Used: Random Forest Classifier
- Platform: SageMaker Studio Lab (CPU instance)
- Dataset: Iris dataset (150 samples, 4 features, 3 classes)
- Python 3
- Scikit-learn
- Pandas
- NumPy
- Matplotlib / Seaborn (optional for visualization)
- SageMaker Studio Lab (environment)
-
Import Dataset
-
Data Exploration & Visualization
-
Preprocessing & Feature Selection
-
Train-Test Split
-
Model Training (Random Forest)
-
Prediction & Evaluation
- Accuracy
- Confusion Matrix
- Classification Report
- Model Used: Random Forest Classifier
- Accuracy Achieved:ย 90%
- Training Time: Under 1 second (due to small dataset)
โ๏ธ ๐ iris_classifier_sagemaker.ipynb
โ๏ธ ๐ README.md
- Open SageMaker Studio Lab
- Upload
iris_classifier.ipynb - Choose CPU runtime
- Run each cell to execute the pipeline
- End-to-end pipeline construction using
Scikit-learn - Using Random Forest for multi-class classification
- Model evaluation techniques and interpretation
- Hands-on with SageMaker Studio Lab (AWS-hosted Jupyter environment)
- Integrate with AWS SageMaker SDK for deployment
- Compare with other models like SVM, XGBoost, KNN
- Add interactive dashboard using Gradio or Streamlit
- Perform hyperparameter tuning for improved accuracy
- Expand to more complex, real-world datasets