XAND-Ray

OneShot · Single Model · DenseNet121 · 512×512px · Binary Classification

Chest X-ray binary classification with explicit distribution shift awareness.
Designed for research-grade screening experiments under distribution shift. Not intended for clinical use.

XAND-Ray trains per-disease binary classifiers on CheXpert using DenseNet121, covering 9 pathologies with a vanilla architecture — no custom pooling, label smoothing, or uncertainty handling.

📊 Results & Evaluation Protocol

Splits:

Split (2K) — Best AUC on validation split (2000 imgs, automatic partition from train)
Valid (202) — Best AUC on official validation (202 imgs, radiologist labels)
Test† (518) — AUC on test set at the epoch where valid was best → the deployed model
PadChest — External validation on PadChest gold subset (~24K imgs, PA projection only, physician-verified labels). CheXpert → PadChest label mapping applied per disease

Disease	Labels	CheXpert Split (2K)	CheXpert Valid (202)	CheXpert Test† (518)	PadChest
Pneumothorax	CXB	0.8791	0.9553	0.9814	0.8688
Pleural Effusion	VCB	0.9423	0.9424	0.9468	0.9587
Pneumonia	VCB	0.8740	0.9304	0.8982	0.7729
Lung Opacity	VCB	0.9345	0.9252	0.9294	-
Cardiomegaly	CXB	0.9240	0.8651	0.9218	0.9239
Edema	CXB	0.8451	0.9540	0.9165	0.9484
Consolidation	VCB	0.9355	0.9316	0.8770	0.8565
Enlarged Cardiom.	CXB	0.8612	0.8855	0.8781	-
Atelectasis	VCB	0.9009	0.8995	0.8691	0.7213

† = test metric at the epoch of best official validation (selected model)

VCB = VisualCheXbert labels (189K uniform train), CXB = CheXbert labels (filtered train, varies per disease)
Each label type (VCB, CXB) trains an independent model; the one with best official validation (202) is reported
All splits are patient-level (no patient appears in more than one split).
Confidence intervals planned for a future release.

Peak Test AUC (Upper Bounds)

Highest test AUC observed during training, regardless of epoch. Not used for model selection — reported for transparency only.

Disease	CheXpert Test Peak‡
Pneumothorax	0.9814
Pleural Effusion	0.9504
Pneumonia	0.9498
Lung Opacity	0.9390
Cardiomegaly	0.9227
Edema	0.9221
Consolidation	0.8958
Enlarged Cardiom.	0.8853
Atelectasis	0.8715

🔍 Grad-CAM Attention Maps

Attention maps have been validated against CheXlocalize ground truth segmentations (IoU, Dice, point-hit metrics). The model is trained purely for classification without localization loss, yet activation maps consistently fall on clinically relevant anatomical regions for each pathology.

Each pathology is visualized with four views:

View	Description
Original	Input chest X-ray from the official test split (blind evaluation)
Attention Map	Grad-CAM heatmap — red zones indicate where the model focused to decide about that specific pathology
CheXlocalize Mask	Academic reference segmentation from CheXlocalize (Stanford ML Group), included as approximate visual orientation
Overlay	Combined view for easier comparison

Same Patient, Different Questions

The following maps were generated from the same chest X-ray, each querying a different pathology. Notice how the attention region shifts depending on what the model is being asked:

⚠️ Interpreting attention maps

Each heatmap answers one question: "where does the model look to decide about this specific disease?" It is not a comprehensive scan of the image.

This model is a binary classifier that detects the minimum evidence needed to decide if a condition is present or absent. It does not delineate the full extent of a lesion, nor does it prioritize across multiple co-existing findings.

Cardiomegaly

Pleural Effusion

Edema

Lung Opacity

Consolidation

Note: The visual examples below are composite research visualizations. The original dataset is not redistributed and remains subject to its respective license.

🧠 Methodology & Model Design

Per-disease binary classifiers — independent model per pathology, no multilabel compromises. Each optimized for its own label source, training set size, and class balance
Pure label training — only explicit positives (1) and negatives (0); uncertain and NaN samples excluded entirely. Focal Loss (γ=2.0) handles class imbalance across 2%–46% positive rates
Hybrid labeling — CheXbert for structural conditions (Cardiomegaly, Pneumothorax), VisualCheXbert for diffuse pathologies (Pneumonia, Atelectasis). Selected per disease by empirical validation AUC
Medical-aware augmentation
- SmallestMaxSize(512) → CenterCrop(512) — preserves thoracic proportions. CenterCrop removes only ~18px of non-diagnostic margins (shoulders, arms)
- Rotate(limit=10°) — mild angle perturbation preserving anatomical orientation (neck up, diaphragm down, ...)
- CLAHE — adaptive contrast equalization, compensates exposure differences across X-ray devices and hospitals
Multitask regularization — auxiliary heads for Device (0.3), Projection (0.005), Sex (0.01). Device detection acts as a severity proxy (tubes, catheters correlate with patient acuity) providing orthogonal signal without competing with the disease head

⚙️ Training, Hardware & Reproducibility

Core Training Configuration

Component	Choice
Backbone	DenseNet121 (ImageNet pretrained)
Input	512 × 512
Loss	Focal Loss (γ = 2.0)
Optimizer	AdamW + Cosine decay
Aux tasks	Device (0.3), Projection (0.005), Sex (0.01)
Dataset	CheXpert 189K (VCB) / filtered (CXB)
Early stopping	Patience 3 on split AUC
Tracking	MLflow

Hardware Used

GPU	Role
RTX 4060 Ti 16GB	Primary training & reported results
2 x RTX A5000 24GB	Validation & parallel scalability training testing

Recommended: > 12 GB VRAM (~13.4 GB typical usage).

Experiment Traceability

Experiments are tracked with MLflow (outputs/mlflow.db), logging:

AUC / AUC-PR
Loss and learning rate curves
GPU memory usage
Dataset statistics and class balance

Checkpoints and runs are fully reproducible.

🚀 Quick Start

Installation

git clone https://github.com/XOREngine/xand-ray.git
cd xand-ray
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For exact reproducibility, use requirements-lock.txt instead.

Data Setup

Access the CheXpert dataset via the official Stanford ML Group portal (license required) https://stanfordmlgroup.github.io/competitions/chexpert/ (train + valid images, train.csv, train_cheXbert.csv, train_visualCheXbert.csv, valid.csv)
Place all files under your configured data directory and set $DATA to its path

See QUICKSTART.md for full setup instructions, $DATA configuration and details.

Prepare CSVs

Training requires preprocessed CSVs generated by prepareDataset.py. Do not use raw CheXpert CSVs directly.

python scripts/prepareDataset.py \
  --config configs/densenet_rtx4060.yaml \
  --disease "Cardiomegaly" "Pneumothorax" \
  --csv-train $DATA/chexpert/CheXpert-v1.0/train_cheXbert.csv \
  --csv-valid-real $DATA/chexpert/CheXpert-v1.0/valid.csv \
  --csv-test-real $DATA/chexpert_test_labels/groundtruth.csv \
  --train-images-root $DATA/chexpert/CheXpert-v1.0/train \
  --valid-images-root $DATA/chexpert/CheXpert-v1.0/valid \
  --test-images-root $DATA/chexlocalize/CheXpert/test

Train

python scripts/trainModel.py \
  --config configs/densenet_rtx4060.yaml \
  --disease "Cardiomegaly" \
  --gpu 0 \
  --csv-train outputs/processedCSVs/chexpert_cardiomegaly_train.csv \
  --csv-valid-split outputs/processedCSVs/chexpert_cardiomegaly_valid.csv \
  --csv-valid-real outputs/processedCSVs/chexpert_cardiomegaly_valid_real.csv \
  --csv-test-real outputs/processedCSVs/chexpert_cardiomegaly_test_real.csv

📚 References

Datasets & Labels

CheXpert — https://stanfordmlgroup.github.io/competitions/chexpert/
CheXbert (labeler) — https://github.com/stanfordmlgroup/CheXbert
VisualCheXbert — https://github.com/stanfordmlgroup/VisualCheXbert
CheXlocalize — https://github.com/rajpurkarlab/CheXlocalize
CheXpert Test Set Labels — https://github.com/rajpurkarlab/cheXpert-test-set-labels
PadChest (BIMCV) — https://bimcv.cipf.es/bimcv-projects/padchest/

External Benchmarks & Comparative Works

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison — https://arxiv.org/abs/1901.07031
CheXbert: Combining Automatic Labelers and Expert Annotations — https://arxiv.org/abs/2004.09167
VisualCheXbert: Addressing the Discrepancy Between Radiology Report Labels and Image Labels — https://arxiv.org/abs/2102.11467
PadChest: A Large Chest X-ray Image Dataset with Multi-Label Annotated Reports — https://arxiv.org/abs/1901.07441
CheXternal: Generalization Across Institutions — https://arxiv.org/abs/2102.08660
Comparing Deep Neural Networks on CheXpert (Scientific Reports, 2020) — https://www.nature.com/articles/s41598-020-70479-z
Benchmarking saliency methods for chest X-ray interpretation — https://doi.org/10.1038/s42256-022-00536-x
jfhealthcare — DenseNet121 + PCAM pooling on CheXpert — https://github.com/jfhealthcare/Chexpert · PCAM paper
Pham et al. — Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels — https://arxiv.org/abs/1911.06475

Optimization & Methods

DenseNet: Densely Connected Convolutional Networks — https://arxiv.org/abs/1608.06993
Focal Loss — https://arxiv.org/abs/1708.02002
Decoupled Weight Decay (AdamW) — https://arxiv.org/abs/1711.05101
Grad-CAM — https://arxiv.org/abs/1610.02391

👥 Contributors & Acknowledgements

Final implementation and project coordination:

José Artusa (@WallyByte)

External methodology review:

Rubén J.R. (@rubenjr0)

Early-stage architectural exploration (ResNet / DenseNet / ConvNeXT. baseline phase):

Rubén Solano (@rubensolano2)

🛡️ Independent Technical Review & Reproducibility Audit

An independent technical review and full reproducibility verification of XAND-Ray Baseline v0.1 was conducted by:

Rubén J.R.
External Reviewer
📄 Independent Technical Review – v0.1 (14/02/2026)

Scope of the Review

Verification of proper train / validation / test separation (patient-level split)
Absence of data leakage or cross-dependencies
Independent full replication of the training pipeline from scratch
Reproduction of reported AUC-ROC metrics
Evaluation flow integrity (no test reuse or post-hoc bias)
Architectural consistency review (DenseNet121 integration and pipeline coherence)

Conclusion

The reported metrics were independently reproduced and remain within expected technical variation margins inherent to non-deterministic computational environments.

No structural inconsistencies, data leakage, or improper use of the test set were identified during the review.

⚠️ This audit is strictly limited to technical analysis and computational reproducibility. It does not constitute clinical validation or medical certification.

XAND-Ray — Chest X-Ray Screening Research.
💪 If you adapt it, extend it, or improve it — go for it.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/gradcam		assets/gradcam
configs		configs
docs		docs
scripts		scripts
src		src
LICENSE		LICENSE
NOTICE		NOTICE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XAND-Ray

OneShot · Single Model · DenseNet121 · 512×512px · Binary Classification

📊 Results & Evaluation Protocol

Peak Test AUC (Upper Bounds)

🔍 Grad-CAM Attention Maps

Same Patient, Different Questions

🧠 Methodology & Model Design

⚙️ Training, Hardware & Reproducibility

Core Training Configuration

Hardware Used

Experiment Traceability

🚀 Quick Start

Installation

Data Setup

Prepare CSVs

Train

📚 References

Datasets & Labels

External Benchmarks & Comparative Works

Optimization & Methods

👥 Contributors & Acknowledgements

🛡️ Independent Technical Review & Reproducibility Audit

Scope of the Review

Conclusion

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

XAND-Ray

OneShot · Single Model · DenseNet121 · 512×512px · Binary Classification

📊 Results & Evaluation Protocol

Peak Test AUC (Upper Bounds)

🔍 Grad-CAM Attention Maps

Same Patient, Different Questions

🧠 Methodology & Model Design

⚙️ Training, Hardware & Reproducibility

Core Training Configuration

Hardware Used

Experiment Traceability

🚀 Quick Start

Installation

Data Setup

Prepare CSVs

Train

📚 References

Datasets & Labels

External Benchmarks & Comparative Works

Optimization & Methods

👥 Contributors & Acknowledgements

🛡️ Independent Technical Review & Reproducibility Audit

Scope of the Review

Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages