An advanced AI-powered healthcare analytics platform for predicting chronic disease risks and providing personalized medical recommendations.
- Advanced Risk Prediction: Utilizes XGBoost model with multi-condition analysis
- Interactive Dashboard: Real-time analytics and population health insights
- AI-Powered Recommendations: Personalized health guidance using Llama 3.2 model
- Secure Data Management: MySQL integration for robust data storage
- Dynamic Visualizations: Interactive charts and metrics using Plotly
- Modern UI: Responsive design with Streamlit components
- Streamlit (v1.40.2)
- Plotly for interactive visualizations
- Custom CSS styling
- Python 3.8+
- MySQL database - CDRPredictor Database
- Advanced logging system
- XGBoost for risk prediction
- Together AI (Llama-3.2-3B-Instruct-Turbo) for recommendations
- Scikit-learn for data preprocessing
- Pandas (v2.2.3)
- NumPy (v2.1.3)
- Joblib (v1.4.2)
- Real-time Analytics: Monitor patient risk levels and trends
- Population Health Metrics: Track key health indicators
- Condition Distribution: Analyze prevalence of chronic conditions
- Advanced Filtering: Customize views by demographics and conditions
- Interactive Charts: Dynamic visualization of health trends
- Python 3.8+
- MySQL Server - CDRPredictor Database
- pip package manager
-
Clone the repository:
git clone https://github.com/ajitonelsonn/chronic_disease_predictor.git cd chronic_disease_predictor
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Configure Environment: Create
.streamlit/secrets.toml
:[api_keys] togetherapi = "your_together_api_key" [database] db_host = "your_db_host" db_username = "your_db_username" db_password = "your_db_password" db_name = "your_db_name" db_port = "your_db_port"
chronic_disease_predictor/
βββ .streamlit/
β βββ config.toml
β βββ secrets.toml
βββ components.py
βββ pages/
β βββ dashboard.py
βββ database/
β βββ schema.sql
βββ model/
β βββ best_chronic_disease_model.joblib
β βββ feature_scaler.joblib
β βββ label_encoder.joblib
βββ utils.py
βββ styles.py
βββ database.py
βββ model_utils.py
βββ recommend.py
βββ streamlit_app.py
βββ requirements.txt
Detailed documentation of our model development process can be found in our Jupyter Notebook or Create Model.
- Processed 450,000 patient records with 37.5M entries
- Integrated data from multiple sources:
- Member demographics
- Enrollment history
- Service records
- Provider information
We evaluated three different models:
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Random Forest | 74.67% | 70.41% | 52.86% | 33762.05 |
XGBoost | 81.76% | 57.59% | 59.66% | 33762.05 |
LightGBM | 77.72% | 73.18% | 56.16% | 33762.05 |
We selected XGBoost as our final model due to:
- Highest accuracy (81.76%)
- Better handling of complex feature relationships
- Efficient prediction time
- Good balance of performance metrics
Key features used in the model:
features = [
'MEM_GENDER_ENCODED',
'MEM_RACE_ENCODED',
'MEM_ETHNICITY_ENCODED',
'MEM_AGE_NUMERIC',
'DIAGNOSTIC_CONDITION_CATEGORY_DESC_ENCODED',
# Disease flags
'HAS_HYPERTENSION',
'HAS_DIABETES',
'HAS_RENAL_FAILURE',
# ... and more
]
-
Data Preparation
- Feature encoding
- Handling missing values
- Data normalization
-
Model Development
- Cross-validation
- Hyperparameter tuning
- Performance evaluation
-
Model Optimization
- Feature importance analysis
- Model compression
- Inference optimization
For detailed implementation and analysis, check our model development notebook.
graph TD
A[User Input] --> B[Data Processing]
B --> C[Risk Assessment]
C --> D[Database Storage]
D --> E[Dashboard Analytics]
E --> F[Visualization]
C --> G[LLM Analysis]
G --> H[Medical Recommendations]
subgraph "Backend Processing"
B
C
D
end
subgraph "Frontend Display"
E
F
H
end
- Secure database connections
- API key management
- Error logging and monitoring
- Data validation and sanitization
The dashboard provides:
- Risk level distribution trends
- Condition prevalence analysis
- Demographic insights
- Prediction confidence metrics
- Historical data analysis