This is an interactive web application built using Streamlit that predicts the species of penguins based on user-provided features. The app uses a Random Forest Classifier machine learning model for prediction and provides insights into the dataset with visualizations.
-
Penguin Species Prediction:
- Predicts the species of penguins (“Adelie”, “Chinstrap”, or “Gentoo”).
- Uses features like bill length, bill depth, flipper length, body mass, island, and gender.
-
Data Visualization:
- Provides a scatter plot to visualize the relationship between penguin features such as bill length and body mass.
-
Interactive Input:
- Allows users to input penguin features through sliders and dropdowns for real-time predictions.
-
Probability Display:
- Shows the probability of each penguin species based on the input features.
The app uses the Palmer Archipelago (Antarctica) penguin dataset:
- Source: Palmer Penguins Dataset
- Features:
island
: Island where the penguin was found.bill_length_mm
: Length of the penguin’s bill (in mm).bill_depth_mm
: Depth of the penguin’s bill (in mm).flipper_length_mm
: Length of the penguin’s flipper (in mm).body_mass_g
: Weight of the penguin (in grams).sex
: Gender of the penguin.
- Target:
species
(Adelie, Chinstrap, Gentoo).
The app uses the Random Forest Classifier:
- A robust, ensemble-based machine learning algorithm.
- Combines predictions from multiple decision trees for high accuracy.
- Encodes categorical variables like
island
andsex
using one-hot encoding. - Maps target labels (“Adelie”, “Chinstrap”, “Gentoo”) to numerical values for training.
Follow these steps to run the app locally:
- Python 3.8 or higher
- Install required Python libraries:
pip install streamlit pandas numpy scikit-learn
git clone <repository-url>
cd <repository-folder>
streamlit run app.py
- Load the Data: Loads the penguin dataset from a public URL.
- Explore the Data: Displays raw data and visualizations.
- Get User Inputs: Accepts user inputs via sliders and dropdown menus for features like bill length, bill depth, flipper length, etc.
- Prepare Data: Encodes categorical data and scales numerical features as needed.
- Train the Model: Trains a Random Forest Classifier on the dataset.
- Make Predictions:
- Predicts the penguin species based on user inputs.
- Displays probabilities for each species.
graph TD
A[Load Dataset] --> B[Explore Dataset]
B --> C[Visualize Data]
C --> D[Input User Features]
D --> E[Prepare Data for Model]
E --> F[Train Random Forest Classifier]
F --> G[Predict Species]
G --> H[Display Results]
-
Sidebar:
- Input penguin features (island, bill length, bill depth, etc.)
- Real-time predictions based on the inputs.
-
Main Page:
- Raw data table.
- Scatter plot visualization of bill length vs. body mass.
- Probabilities of each species and the predicted species.
- Open the app by running the command
streamlit run app.py
. - Input the following example values:
- Island: Dream
- Bill length: 45 mm
- Bill depth: 18 mm
- Flipper length: 200 mm
- Body mass: 4000 g
- Gender: Male
- The app predicts the penguin species (e.g., Gentoo) and displays probabilities for all species.
- Frontend: Streamlit for interactive UI.
- Backend: Python for processing data and machine learning.
- Machine Learning: Scikit-learn’s Random Forest Classifier.
- Data Visualization: Streamlit’s charting tools.
- Add additional visualizations (e.g., histograms for each feature).
- Allow users to upload their datasets for predictions.
- Include more advanced machine learning algorithms for better accuracy.
-
Missing Libraries:
- Ensure all required libraries are installed.
- Run:
pip install -r requirements.txt
.
-
Dataset Issues:
- Ensure you have internet access as the dataset is fetched from a URL.
- Dataset Source: Palmer Penguins Dataset
- Developed by: [Sandip Kushwaha]
This project is licensed under the MIT License.