Skip to content

An advanced AI-powered tool for predicting chronic disease risks and providing personalized medical recommendations. The system utilizes machine learning to analyze patient data and generate risk assessments for various chronic conditions.

Notifications You must be signed in to change notification settings

ajitonelsonn/chronic_disease_predictor

Repository files navigation

LAFAEK AI

Project Status Demo Open In Colab

An advanced AI-powered healthcare analytics platform for predicting chronic disease risks and providing personalized medical recommendations.

🌟 Key Features

  • Advanced Risk Prediction: Utilizes XGBoost model with multi-condition analysis
  • Interactive Dashboard: Real-time analytics and population health insights
  • AI-Powered Recommendations: Personalized health guidance using Llama 3.2 model
  • Secure Data Management: MySQL integration for robust data storage
  • Dynamic Visualizations: Interactive charts and metrics using Plotly
  • Modern UI: Responsive design with Streamlit components

πŸ”§ Technology Stack

Frontend

  • Streamlit (v1.40.2)
  • Plotly for interactive visualizations
  • Custom CSS styling

Backend

AI/ML Components

  • XGBoost for risk prediction
  • Together AI (Llama-3.2-3B-Instruct-Turbo) for recommendations
  • Scikit-learn for data preprocessing

Data Processing

  • Pandas (v2.2.3)
  • NumPy (v2.1.3)
  • Joblib (v1.4.2)

πŸ“Š Dashboard Features

  • Real-time Analytics: Monitor patient risk levels and trends
  • Population Health Metrics: Track key health indicators
  • Condition Distribution: Analyze prevalence of chronic conditions
  • Advanced Filtering: Customize views by demographics and conditions
  • Interactive Charts: Dynamic visualization of health trends

πŸš€ Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/ajitonelsonn/chronic_disease_predictor.git
    cd chronic_disease_predictor
  2. Set up a virtual environment:

    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    # or
    venv\Scripts\activate     # Windows
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure Environment: Create .streamlit/secrets.toml:

    [api_keys]
    togetherapi = "your_together_api_key"
    
    [database]
    db_host = "your_db_host"
    db_username = "your_db_username"
    db_password = "your_db_password"
    db_name = "your_db_name"
    db_port = "your_db_port"

πŸ“ Project Structure

chronic_disease_predictor/
β”œβ”€β”€ .streamlit/
β”‚   β”œβ”€β”€ config.toml
β”‚   └── secrets.toml
β”œβ”€β”€ components.py
β”œβ”€β”€ pages/
β”‚   β”œβ”€β”€ dashboard.py
β”œβ”€β”€ database/
β”‚   └── schema.sql
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ best_chronic_disease_model.joblib
β”‚   β”œβ”€β”€ feature_scaler.joblib
β”‚   └── label_encoder.joblib
β”œβ”€β”€ utils.py
β”œβ”€β”€ styles.py
β”œβ”€β”€ database.py
β”œβ”€β”€ model_utils.py
β”œβ”€β”€ recommend.py
β”œβ”€β”€ streamlit_app.py
└── requirements.txt

πŸ€– Model Development

Records Patients Accuracy

Detailed documentation of our model development process can be found in our Jupyter Notebook or Create Model.

Data Processing

  • Processed 450,000 patient records with 37.5M entries
  • Integrated data from multiple sources:
    • Member demographics
    • Enrollment history
    • Service records
    • Provider information

Model Selection Process

We evaluated three different models:

Model Accuracy Precision Recall F1-Score
Random Forest 74.67% 70.41% 52.86% 33762.05
XGBoost 81.76% 57.59% 59.66% 33762.05
LightGBM 77.72% 73.18% 56.16% 33762.05

Why XGBoost?

We selected XGBoost as our final model due to:

  • Highest accuracy (81.76%)
  • Better handling of complex feature relationships
  • Efficient prediction time
  • Good balance of performance metrics

Feature Engineering

Key features used in the model:

features = [
    'MEM_GENDER_ENCODED',
    'MEM_RACE_ENCODED',
    'MEM_ETHNICITY_ENCODED',
    'MEM_AGE_NUMERIC',
    'DIAGNOSTIC_CONDITION_CATEGORY_DESC_ENCODED',
    # Disease flags
    'HAS_HYPERTENSION',
    'HAS_DIABETES',
    'HAS_RENAL_FAILURE',
    # ... and more
]

Model Training Process

  1. Data Preparation

    • Feature encoding
    • Handling missing values
    • Data normalization
  2. Model Development

    • Cross-validation
    • Hyperparameter tuning
    • Performance evaluation
  3. Model Optimization

    • Feature importance analysis
    • Model compression
    • Inference optimization

For detailed implementation and analysis, check our model development notebook.


πŸ’» Application Workflow

graph TD
    A[User Input] --> B[Data Processing]
    B --> C[Risk Assessment]
    C --> D[Database Storage]
    D --> E[Dashboard Analytics]
    E --> F[Visualization]
    C --> G[LLM Analysis]
    G --> H[Medical Recommendations]

    subgraph "Backend Processing"
    B
    C
    D
    end

    subgraph "Frontend Display"
    E
    F
    H
    end
Loading

πŸ”’ Security Features

  • Secure database connections
  • API key management
  • Error logging and monitoring
  • Data validation and sanitization

πŸ“ˆ Dashboard Analytics

The dashboard provides:

  • Risk level distribution trends
  • Condition prevalence analysis
  • Demographic insights
  • Prediction confidence metrics
  • Historical data analysis

πŸ‘₯ Author

Ajito Nelson Lucio da Costa

Facebook LinkedIn


Built with ❀️ in Timor-Leste πŸ‡ΉπŸ‡±

About

An advanced AI-powered tool for predicting chronic disease risks and providing personalized medical recommendations. The system utilizes machine learning to analyze patient data and generate risk assessments for various chronic conditions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published