Project Overview: This project leverages machine learning to optimize hospital revenue through data-driven insights and predictive modeling. It begins with data wrangling, cleaning, and exploratory analysis to identify statistical relationships between key factors influencing revenue in a synthetic medical group.
Methodology and Approach: The project focuses on Decision Tree and Linear Regression models, primarily using Scikit-learn and other Python ML libraries. Through feature selection, model tuning, and cross-validation, I developed predictive models capable of forecasting revenue with a 10% accuracy margin.
Key steps include:
- Data Wrangling & Cleaning: Addressed inconsistencies, outliers, and missing values to ensure reliable input for modeling.
- Exploratory Data Analysis (EDA): Analyzed correlations, distributions, and trends within the data.
- Model Development: Fine-tuned Decision Tree for categorical insights and Linear Regression for continuous revenue forecasting.
- Model Validation: Used performance metrics (R-squared, RMSE) to assess and refine model accuracy.
Future Directions:
- Refine Regression Model: Further feature engineering and hyperparameter tuning to improve prediction accuracy.
- Explore Advanced Models: Investigate Random Forest and XGBoost for a more robust predictive system.
- Data Standardization: Recommend standardizing data collection methods across healthcare locations to improve consistency and model reliability.
- Web Application: Propose a web-based platform for real-time data integration, enhancing model performance.