This project demonstrates how to use Support Vector Regression (SVR) to predict restaurant total bills using the well-known Tips dataset. It includes preprocessing (label + one-hot encoding), model training, evaluation, and hyperparameter tuning using GridSearchCV.
The Tips dataset from Seaborn includes information about meals in a restaurant and the corresponding tips. It contains both numerical and categorical variables.
Features:
total_bill: Total cost of the meal (target)tip: Tip amountsex: Gender of the customer (Male/Female)smoker: Whether the customer smokes (Yes/No)day: Day of the weektime: Time of day (Lunch/Dinner)size: Number of people in the party
- pandas
- numpy
- seaborn
- scikit-learn (SVR, GridSearchCV, LabelEncoder, OneHotEncoder, ColumnTransformer)
- matplotlib or seaborn (for optional visualizations)
- Label Encoding: Applied to binary categorical features (
sex,smoker,time) - One-Hot Encoding: Applied to
dayusingColumnTransformerwithdrop='first'to avoid multicollinearity
X = ['tip', 'sex', 'smoker', 'day', 'time', 'size']y = total_bill
- Performed before encoding to avoid data leakage
test_size=0.2,random_state=42
- Model:
SVR() - Fitted on encoded training data
- Evaluated on test set
Initial SVR Results:
- R-squared Score:
0.5502 - Mean Absolute Error (MAE):
4.41
Using GridSearchCV
This project showcases the application of SVR on real-world-like data, along with:
- Proper feature engineering (label + one-hot encoding)
- Avoiding data leakage
- Hyperparameter tuning for optimization While performance could still be improved with more complex models or feature engineering, this provides a strong foundation for regression modeling with scikit-learn.
Mai3Prabhu