Chest X-ray binary classification with explicit distribution shift awareness.
Designed for research-grade screening experiments under distribution shift. Not intended for clinical use.
XAND-Ray trains per-disease binary classifiers on CheXpert using DenseNet121, covering 9 pathologies with a vanilla architecture β no custom pooling, label smoothing, or uncertainty handling.
Splits:
- Split (2K) β Best AUC on validation split (2000 imgs, automatic partition from train)
- Valid (202) β Best AUC on official validation (202 imgs, radiologist labels)
- Testβ (518) β AUC on test set at the epoch where valid was best β the deployed model
- PadChest β External validation on PadChest gold subset (~24K imgs, PA projection only, physician-verified labels). CheXpert β PadChest label mapping applied per disease
| Disease | Labels | CheXpert Split (2K) |
CheXpert Valid (202) |
CheXpert Testβ (518) |
PadChest |
|---|---|---|---|---|---|
| Pneumothorax | CXB | 0.8791 | 0.9553 | 0.9814 | 0.8688 |
| Pleural Effusion | VCB | 0.9423 | 0.9424 | 0.9468 | 0.9587 |
| Pneumonia | VCB | 0.8740 | 0.9304 | 0.8982 | 0.7729 |
| Lung Opacity | VCB | 0.9345 | 0.9252 | 0.9294 | - |
| Cardiomegaly | CXB | 0.9240 | 0.8651 | 0.9218 | 0.9239 |
| Edema | CXB | 0.8451 | 0.9540 | 0.9165 | 0.9484 |
| Consolidation | VCB | 0.9355 | 0.9316 | 0.8770 | 0.8565 |
| Enlarged Cardiom. | CXB | 0.8612 | 0.8855 | 0.8781 | - |
| Atelectasis | VCB | 0.9009 | 0.8995 | 0.8691 | 0.7213 |
β = test metric at the epoch of best official validation (selected model)
- VCB = VisualCheXbert labels (189K uniform train), CXB = CheXbert labels (filtered train, varies per disease)
- Each label type (VCB, CXB) trains an independent model; the one with best official validation (202) is reported
- All splits are patient-level (no patient appears in more than one split).
- Confidence intervals planned for a future release.
Highest test AUC observed during training, regardless of epoch. Not used for model selection β reported for transparency only.
| Disease | CheXpert Test Peakβ‘ |
|---|---|
| Pneumothorax | 0.9814 |
| Pleural Effusion | 0.9504 |
| Pneumonia | 0.9498 |
| Lung Opacity | 0.9390 |
| Cardiomegaly | 0.9227 |
| Edema | 0.9221 |
| Consolidation | 0.8958 |
| Enlarged Cardiom. | 0.8853 |
| Atelectasis | 0.8715 |
Attention maps have been validated against CheXlocalize ground truth segmentations (IoU, Dice, point-hit metrics). The model is trained purely for classification without localization loss, yet activation maps consistently fall on clinically relevant anatomical regions for each pathology.
Each pathology is visualized with four views:
| View | Description |
|---|---|
| Original | Input chest X-ray from the official test split (blind evaluation) |
| Attention Map | Grad-CAM heatmap β red zones indicate where the model focused to decide about that specific pathology |
| CheXlocalize Mask | Academic reference segmentation from CheXlocalize (Stanford ML Group), included as approximate visual orientation |
| Overlay | Combined view for easier comparison |
The following maps were generated from the same chest X-ray, each querying a different pathology. Notice how the attention region shifts depending on what the model is being asked:
β οΈ Interpreting attention maps
- Each heatmap answers one question: "where does the model look to decide about this specific disease?" It is not a comprehensive scan of the image.
- This model is a binary classifier that detects the minimum evidence needed to decide if a condition is present or absent. It does not delineate the full extent of a lesion, nor does it prioritize across multiple co-existing findings.
Note: The visual examples below are composite research visualizations. The original dataset is not redistributed and remains subject to its respective license.
- Per-disease binary classifiers β independent model per pathology, no multilabel compromises. Each optimized for its own label source, training set size, and class balance
- Pure label training β only explicit positives (1) and negatives (0); uncertain and NaN samples excluded entirely. Focal Loss (Ξ³=2.0) handles class imbalance across 2%β46% positive rates
- Hybrid labeling β CheXbert for structural conditions (Cardiomegaly, Pneumothorax), VisualCheXbert for diffuse pathologies (Pneumonia, Atelectasis). Selected per disease by empirical validation AUC
- Medical-aware augmentation
SmallestMaxSize(512)βCenterCrop(512)β preserves thoracic proportions. CenterCrop removes only ~18px of non-diagnostic margins (shoulders, arms)Rotate(limit=10Β°)β mild angle perturbation preserving anatomical orientation (neck up, diaphragm down, ...)CLAHEβ adaptive contrast equalization, compensates exposure differences across X-ray devices and hospitals
- Multitask regularization β auxiliary heads for Device (0.3), Projection (0.005), Sex (0.01). Device detection acts as a severity proxy (tubes, catheters correlate with patient acuity) providing orthogonal signal without competing with the disease head
| Component | Choice |
|---|---|
| Backbone | DenseNet121 (ImageNet pretrained) |
| Input | 512 Γ 512 |
| Loss | Focal Loss (Ξ³ = 2.0) |
| Optimizer | AdamW + Cosine decay |
| Aux tasks | Device (0.3), Projection (0.005), Sex (0.01) |
| Dataset | CheXpert 189K (VCB) / filtered (CXB) |
| Early stopping | Patience 3 on split AUC |
| Tracking | MLflow |
| GPU | Role |
|---|---|
| RTX 4060 Ti 16GB | Primary training & reported results |
| 2 x RTX A5000 24GB | Validation & parallel scalability training testing |
Recommended: > 12 GB VRAM (~13.4 GB typical usage).
Experiments are tracked with MLflow (outputs/mlflow.db), logging:
- AUC / AUC-PR
- Loss and learning rate curves
- GPU memory usage
- Dataset statistics and class balance
Checkpoints and runs are fully reproducible.
git clone https://github.com/XOREngine/xand-ray.git
cd xand-ray
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtFor exact reproducibility, use requirements-lock.txt instead.
- Access the CheXpert dataset via the official Stanford ML Group portal (license required) https://stanfordmlgroup.github.io/competitions/chexpert/ (train + valid images, train.csv, train_cheXbert.csv, train_visualCheXbert.csv, valid.csv)
- Place all files under your configured data directory and set $DATA to its path
See QUICKSTART.md for full setup instructions, $DATA configuration and details.
Training requires preprocessed CSVs generated by prepareDataset.py. Do not use raw CheXpert CSVs directly.
python scripts/prepareDataset.py \
--config configs/densenet_rtx4060.yaml \
--disease "Cardiomegaly" "Pneumothorax" \
--csv-train $DATA/chexpert/CheXpert-v1.0/train_cheXbert.csv \
--csv-valid-real $DATA/chexpert/CheXpert-v1.0/valid.csv \
--csv-test-real $DATA/chexpert_test_labels/groundtruth.csv \
--train-images-root $DATA/chexpert/CheXpert-v1.0/train \
--valid-images-root $DATA/chexpert/CheXpert-v1.0/valid \
--test-images-root $DATA/chexlocalize/CheXpert/testpython scripts/trainModel.py \
--config configs/densenet_rtx4060.yaml \
--disease "Cardiomegaly" \
--gpu 0 \
--csv-train outputs/processedCSVs/chexpert_cardiomegaly_train.csv \
--csv-valid-split outputs/processedCSVs/chexpert_cardiomegaly_valid.csv \
--csv-valid-real outputs/processedCSVs/chexpert_cardiomegaly_valid_real.csv \
--csv-test-real outputs/processedCSVs/chexpert_cardiomegaly_test_real.csv- CheXpert β https://stanfordmlgroup.github.io/competitions/chexpert/
- CheXbert (labeler) β https://github.com/stanfordmlgroup/CheXbert
- VisualCheXbert β https://github.com/stanfordmlgroup/VisualCheXbert
- CheXlocalize β https://github.com/rajpurkarlab/CheXlocalize
- CheXpert Test Set Labels β https://github.com/rajpurkarlab/cheXpert-test-set-labels
- PadChest (BIMCV) β https://bimcv.cipf.es/bimcv-projects/padchest/
- CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison β https://arxiv.org/abs/1901.07031
- CheXbert: Combining Automatic Labelers and Expert Annotations β https://arxiv.org/abs/2004.09167
- VisualCheXbert: Addressing the Discrepancy Between Radiology Report Labels and Image Labels β https://arxiv.org/abs/2102.11467
- PadChest: A Large Chest X-ray Image Dataset with Multi-Label Annotated Reports β https://arxiv.org/abs/1901.07441
- CheXternal: Generalization Across Institutions β https://arxiv.org/abs/2102.08660
- Comparing Deep Neural Networks on CheXpert (Scientific Reports, 2020) β https://www.nature.com/articles/s41598-020-70479-z
- Benchmarking saliency methods for chest X-ray interpretation β https://doi.org/10.1038/s42256-022-00536-x
- jfhealthcare β DenseNet121 + PCAM pooling on CheXpert β https://github.com/jfhealthcare/Chexpert Β· PCAM paper
- Pham et al. β Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels β https://arxiv.org/abs/1911.06475
- DenseNet: Densely Connected Convolutional Networks β https://arxiv.org/abs/1608.06993
- Focal Loss β https://arxiv.org/abs/1708.02002
- Decoupled Weight Decay (AdamW) β https://arxiv.org/abs/1711.05101
- Grad-CAM β https://arxiv.org/abs/1610.02391
Final implementation and project coordination:
- JosΓ© Artusa (@WallyByte)
External methodology review:
- RubΓ©n J.R. (@rubenjr0)
Early-stage architectural exploration (ResNet / DenseNet / ConvNeXT. baseline phase):
- RubΓ©n Solano (@rubensolano2)
An independent technical review and full reproducibility verification of XAND-Ray Baseline v0.1 was conducted by:
RubΓ©n J.R.
External Reviewer
π Independent Technical Review β v0.1 (14/02/2026)
- Verification of proper train / validation / test separation (patient-level split)
- Absence of data leakage or cross-dependencies
- Independent full replication of the training pipeline from scratch
- Reproduction of reported AUC-ROC metrics
- Evaluation flow integrity (no test reuse or post-hoc bias)
- Architectural consistency review (DenseNet121 integration and pipeline coherence)
The reported metrics were independently reproduced and remain within expected technical variation margins inherent to non-deterministic computational environments.
No structural inconsistencies, data leakage, or improper use of the test set were identified during the review.
XAND-Ray β Chest X-Ray Screening Research.
πͺ If you adapt it, extend it, or improve it β go for it.
Β© 2026 XOREngine Β· Open Source Commitment




