- Project Overview
- Algorithm Used (KNN)
- Dataset
- Project Structure
- Setup and Installation
- How to Run the Streamlit App
- Results and Visualization
- Conclusion
This project demonstrates a classic Machine Learning classification task: identifying the species of an Iris flower based on its physical measurements. We utilize the K-Nearest Neighbors (KNN) algorithm to build a robust model and package the solution in an interactive web application using Streamlit.
- Data Analysis: Exploratory Data Analysis (EDA) of the Iris dataset.
- Model Training: KNN implementation using
scikit-learn. - Interactive App: A Streamlit interface for real-time classification input.
The core of this project is the K-Nearest Neighbors (KNN) algorithm.
-
How it Works: KNN is a non-parametric, lazy learning algorithm. It classifies a new data point based on the majority class among its
$K$ nearest neighbors. The 'distance' (Euclidean distance is typically used) is calculated between the new point and all existing data points to find the closest ones. - Hyperparameter: The value of K (the number of neighbors) was chosen to be [Insert your K value, e.g., 5] after initial testing showed optimal performance.
This project uses the famous Iris flower dataset, which is often called the "Hello World" of Machine Learning.
| Feature | Description | Unit |
|---|---|---|
sepal_length |
Length of the sepal | cm |
sepal_width |
Width of the sepal | cm |
petal_length |
Length of the petal | cm |
petal_width |
Width of the petal | cm |
species |
The target class (Setosa, Versicolor, or Virginica) | N/A |
The repository is organized as follows: iris-classification-knn/ โโโ .gitignore โโโ README.md โโโ requirements.txt # Lists all necessary Python libraries โโโ iris_classifier.py # Main ML code: loads data, trains KNN, saves model. โโโ streamlit_app.py # Streamlit code for the interactive web interface.
To run this project locally, follow these steps:
-
Clone the repository:
git clone [https://github.com/YourUsername/iris-classification-knn.git](https://github.com/YourUsername/iris-classification-knn.git) cd iris-classification-knn -
Create and activate a virtual environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required libraries:
pip install -r requirements.txt
-
Run the ML Training Script: This will train the KNN model and create the necessary artifacts (like a serialized model file).
python iris_classifier.py
The project includes an interactive Streamlit application for demonstration.
-
Ensure you have completed the Setup steps above.
-
Run the Streamlit app:
streamlit run streamlit_app.py
-
The app will automatically open in your web browser at a local address (usually
http://localhost:8501).
Figure 1: Streamlit App Interface showing sliders for input features and the predicted species.
The K-Nearest Neighbors (KNN) classifier achieved the following performance metrics on the test set:
- Accuracy: [Insert your calculated Accuracy Score]%
A key step in classification is visualizing the data to understand class separability. The pairplot below illustrates how the three species cluster based on the features.
Figure 2: Scatter plot of Petal Length vs. Petal Width, clearly separating the three Iris species, confirming the data's separability for the KNN model.
The KNN algorithm proved highly effective for classifying the Iris species, achieving high accuracy. The Streamlit app provides a simple, intuitive way to interact with the trained model, making this a complete and accessible Machine Learning project for demonstration and learning.