This challenge builds on the NIH Chest X-Ray dataset, which contains over 112,000 medical images from 30,000 patients. Participants will explore how federated learning can enable robust diagnostic models that generalize across hospitals, without sharing sensitive patient data.
- Privacy-Preserving: Hospital data stays local; only model updates are shared
- Non-IID Data: Realistic simulation of diverse hospital environments
- Multi-Hospital Setup: Three distinct hospital silos with unique characteristics
- Binary Classification: Detect presence of any pathological finding
- Large-Scale Dataset: 112,000+ medical images across distributed nodes
- GPU-Optimized Training: Cluster-based distributed learning with resource management
In real healthcare systems, hospitals differ in their imaging devices, patient populations, and clinical practices. A model trained in one hospital often struggles in another, but because the data distributions differ.
Your task is to design a model that performs reliably across diverse hospital environments. By simulating a federated setup, where each hospital trains on local data and only model updates are shared, youβll investigate how distributed AI can improve performance and robustness under privacy constraints.
Chest X-rays are among the most common and cost-effective imaging exams, yet diagnosing them remains challenging. For this challenge, the dataset has been artificially partitioned into hospital silos to simulate a federated learning scenario with strong non-IID characteristics. Each patient appears in only one silo. However, age, sex, view position, and pathology distributions vary across silos.
Each patient appears in only one hospital. All splits (train/eval/test) are patient-disjoint to prevent data leakage.
- Demographics: Elderly males (age 60+)
- Equipment: AP (anterior-posterior) view dominant
- Common findings: Fluid-related conditions (Effusion, Edema, Atelectasis)
- Demographics: Younger females (age 20-65)
- Equipment: PA (posterior-anterior) view dominant
- Common findings: Nodules, masses, pneumothorax
- Demographics: Mixed age and gender
- Equipment: PA view preferred
- Common findings: Rare conditions (Hernia, Fibrosis, Emphysema)
Binary classification: Detect presence of any pathological finding
- Class 0: No Finding
- Class 1: Any Finding present
Pathologies (15 types): Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural_Thickening, Hernia
Evaluation Metric: AUROC
# Clone your team's repository
git clone https://github.com/YOUR_ORG/hackathon-2025-team-YOUR_TEAM.git
cd hackathon-2025-team-YOUR_TEAM
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install --upgrade pip
pip install -e .python local_train.py --hospital ANote: Full datasets are only available on the cluster.
# Submit training job
./submit-job.sh "flwr run . cluster --stream" --gpu
# Submit with custom name for easier tracking
./submit-job.sh "flwr run . cluster --stream" --gpu --name exp_lr001
# Test evaluation pipeline
./submit-job.sh "python evaluate.py" --gpu --name eval_v5# Check job status
squeue -u $USER
# View logs
tail -f ~/logs/exp_lr001_*.out
# View W&B dashboard
# https://wandb.ai/coldstart2025-teamXX/coldstart2025Datasets on cluster:
- Raw:
/shared/hackathon/datasets/xray_fl_datasets/ - Preprocessed (128x128):
/shared/hackathon/datasets/xray_fl_datasets_preprocessed_128/
These are automatically linked in your job workspace.
Per job:
- 1 GPU
- 32GB RAM
- 20 minutes runtime
- Max 4 concurrent jobs per team
All metrics automatically logged to W&B: https://wandb.ai/coldstart2025-teamXX/coldstart2025
Login with your team's service account credentials (provided by organizers).
| Component | Technology |
|---|---|
| Language | Python 3.8+ |
| Federated Learning | Flower Framework |
| Deep Learning | PyTorch |
| Experiment Tracking | Weights & Biases |
| Data Processing | NumPy, Pandas, OpenCV |
| Infrastructure | HPC Cluster with GPU (NVIDIA) |
federation-x/
βββ cold_start_hackathon/
β βββ server_app.py # Federated server implementation
β βββ client_app.py # Client-side training logic
β βββ models/ # Neural network architectures
β βββ utils/ # Helper utilities
βββ local_train.py # Local testing script
βββ evaluate.py # Evaluation pipeline
βββ submit-job.sh # Cluster job submission script
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup
βββ README.md # This file
- Flower Framework Documentation - Federated learning framework reference
- AUROC Explanation - Understanding the evaluation metric
- Federated Learning Overview - Academic foundation paper
- NIH Chest X-Ray Dataset - Original dataset information
By completing this challenge, you'll master:
- β Federated Learning fundamentals and architectures
- β Non-IID data challenges and mitigation strategies
- β Distributed training at scale
- β Privacy-preserving machine learning
- β Medical image analysis and classification
- β Experiment tracking and reproducibility
@article{wang2017chestxray,
title={ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks},
author={Wang, Xiaosong and Peng, Yifan and Lu, Le and Lu, Zhiyong and
Bagheri, Mohammadhadi and Summers, Ronald M},
journal={CVPR},
year={2017}
}
We welcome contributions! Please feel free to:
- Report bugs via GitHub Issues
- Submit improvements via Pull Requests
- Share your results and insights
- Repository: https://github.com/niranjanxprt/federation-x
- Issues: https://github.com/niranjanxprt/federation-x/issues
- Organizers: Contact the hackathon team for cluster access and credentials
Good luck, and happy hacking! π
Last Updated: November 15, 2025