Skip to content

XOREngine/xand-ray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

XAND-Ray

OneShot Β· Single Model Β· DenseNet121 Β· 512Γ—512px Β· Binary Classification

License Python CUDA

Chest X-ray binary classification with explicit distribution shift awareness.
Designed for research-grade screening experiments under distribution shift. Not intended for clinical use.

XAND-Ray trains per-disease binary classifiers on CheXpert using DenseNet121, covering 9 pathologies with a vanilla architecture β€” no custom pooling, label smoothing, or uncertainty handling.


πŸ“Š Results & Evaluation Protocol

Splits:

  • Split (2K) β€” Best AUC on validation split (2000 imgs, automatic partition from train)
  • Valid (202) β€” Best AUC on official validation (202 imgs, radiologist labels)
  • Test† (518) β€” AUC on test set at the epoch where valid was best β†’ the deployed model
  • PadChest β€” External validation on PadChest gold subset (~24K imgs, PA projection only, physician-verified labels). CheXpert β†’ PadChest label mapping applied per disease
Disease Labels CheXpert
Split (2K)
CheXpert
Valid (202)
CheXpert
Test† (518)
PadChest
Pneumothorax CXB 0.8791 0.9553 0.9814 0.8688
Pleural Effusion VCB 0.9423 0.9424 0.9468 0.9587
Pneumonia VCB 0.8740 0.9304 0.8982 0.7729
Lung Opacity VCB 0.9345 0.9252 0.9294 -
Cardiomegaly CXB 0.9240 0.8651 0.9218 0.9239
Edema CXB 0.8451 0.9540 0.9165 0.9484
Consolidation VCB 0.9355 0.9316 0.8770 0.8565
Enlarged Cardiom. CXB 0.8612 0.8855 0.8781 -
Atelectasis VCB 0.9009 0.8995 0.8691 0.7213

† = test metric at the epoch of best official validation (selected model)

  • VCB = VisualCheXbert labels (189K uniform train), CXB = CheXbert labels (filtered train, varies per disease)
  • Each label type (VCB, CXB) trains an independent model; the one with best official validation (202) is reported
  • All splits are patient-level (no patient appears in more than one split).
  • Confidence intervals planned for a future release.

Peak Test AUC (Upper Bounds)

Highest test AUC observed during training, regardless of epoch. Not used for model selection β€” reported for transparency only.

Disease CheXpert
Test Peak‑
Pneumothorax 0.9814
Pleural Effusion 0.9504
Pneumonia 0.9498
Lung Opacity 0.9390
Cardiomegaly 0.9227
Edema 0.9221
Consolidation 0.8958
Enlarged Cardiom. 0.8853
Atelectasis 0.8715

πŸ” Grad-CAM Attention Maps

Attention maps have been validated against CheXlocalize ground truth segmentations (IoU, Dice, point-hit metrics). The model is trained purely for classification without localization loss, yet activation maps consistently fall on clinically relevant anatomical regions for each pathology.

Each pathology is visualized with four views:

View Description
Original Input chest X-ray from the official test split (blind evaluation)
Attention Map Grad-CAM heatmap β€” red zones indicate where the model focused to decide about that specific pathology
CheXlocalize Mask Academic reference segmentation from CheXlocalize (Stanford ML Group), included as approximate visual orientation
Overlay Combined view for easier comparison

Same Patient, Different Questions

The following maps were generated from the same chest X-ray, each querying a different pathology. Notice how the attention region shifts depending on what the model is being asked:

⚠️ Interpreting attention maps

  • Each heatmap answers one question: "where does the model look to decide about this specific disease?" It is not a comprehensive scan of the image.
  • This model is a binary classifier that detects the minimum evidence needed to decide if a condition is present or absent. It does not delineate the full extent of a lesion, nor does it prioritize across multiple co-existing findings.

Cardiomegaly Cardiomegaly

Pleural Effusion Pleural Effusion

Edema Edema

Lung Opacity Lung Opacity

Consolidation Consolidation


Note: The visual examples below are composite research visualizations. The original dataset is not redistributed and remains subject to its respective license.


🧠 Methodology & Model Design

  • Per-disease binary classifiers β€” independent model per pathology, no multilabel compromises. Each optimized for its own label source, training set size, and class balance
  • Pure label training β€” only explicit positives (1) and negatives (0); uncertain and NaN samples excluded entirely. Focal Loss (Ξ³=2.0) handles class imbalance across 2%–46% positive rates
  • Hybrid labeling β€” CheXbert for structural conditions (Cardiomegaly, Pneumothorax), VisualCheXbert for diffuse pathologies (Pneumonia, Atelectasis). Selected per disease by empirical validation AUC
  • Medical-aware augmentation
    • SmallestMaxSize(512) β†’ CenterCrop(512) β€” preserves thoracic proportions. CenterCrop removes only ~18px of non-diagnostic margins (shoulders, arms)
    • Rotate(limit=10Β°) β€” mild angle perturbation preserving anatomical orientation (neck up, diaphragm down, ...)
    • CLAHE β€” adaptive contrast equalization, compensates exposure differences across X-ray devices and hospitals
  • Multitask regularization β€” auxiliary heads for Device (0.3), Projection (0.005), Sex (0.01). Device detection acts as a severity proxy (tubes, catheters correlate with patient acuity) providing orthogonal signal without competing with the disease head

βš™οΈ Training, Hardware & Reproducibility

Core Training Configuration

Component Choice
Backbone DenseNet121 (ImageNet pretrained)
Input 512 Γ— 512
Loss Focal Loss (Ξ³ = 2.0)
Optimizer AdamW + Cosine decay
Aux tasks Device (0.3), Projection (0.005), Sex (0.01)
Dataset CheXpert 189K (VCB) / filtered (CXB)
Early stopping Patience 3 on split AUC
Tracking MLflow

Hardware Used

GPU Role
RTX 4060 Ti 16GB Primary training & reported results
2 x RTX A5000 24GB Validation & parallel scalability training testing

Recommended: > 12 GB VRAM (~13.4 GB typical usage).

Experiment Traceability

Experiments are tracked with MLflow (outputs/mlflow.db), logging:

  • AUC / AUC-PR
  • Loss and learning rate curves
  • GPU memory usage
  • Dataset statistics and class balance

Checkpoints and runs are fully reproducible.


πŸš€ Quick Start

Installation

git clone https://github.com/XOREngine/xand-ray.git
cd xand-ray
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For exact reproducibility, use requirements-lock.txt instead.

Data Setup

  1. Access the CheXpert dataset via the official Stanford ML Group portal (license required) https://stanfordmlgroup.github.io/competitions/chexpert/ (train + valid images, train.csv, train_cheXbert.csv, train_visualCheXbert.csv, valid.csv)
  2. Place all files under your configured data directory and set $DATA to its path

See QUICKSTART.md for full setup instructions, $DATA configuration and details.

Prepare CSVs

Training requires preprocessed CSVs generated by prepareDataset.py. Do not use raw CheXpert CSVs directly.

python scripts/prepareDataset.py \
  --config configs/densenet_rtx4060.yaml \
  --disease "Cardiomegaly" "Pneumothorax" \
  --csv-train $DATA/chexpert/CheXpert-v1.0/train_cheXbert.csv \
  --csv-valid-real $DATA/chexpert/CheXpert-v1.0/valid.csv \
  --csv-test-real $DATA/chexpert_test_labels/groundtruth.csv \
  --train-images-root $DATA/chexpert/CheXpert-v1.0/train \
  --valid-images-root $DATA/chexpert/CheXpert-v1.0/valid \
  --test-images-root $DATA/chexlocalize/CheXpert/test

Train

python scripts/trainModel.py \
  --config configs/densenet_rtx4060.yaml \
  --disease "Cardiomegaly" \
  --gpu 0 \
  --csv-train outputs/processedCSVs/chexpert_cardiomegaly_train.csv \
  --csv-valid-split outputs/processedCSVs/chexpert_cardiomegaly_valid.csv \
  --csv-valid-real outputs/processedCSVs/chexpert_cardiomegaly_valid_real.csv \
  --csv-test-real outputs/processedCSVs/chexpert_cardiomegaly_test_real.csv

πŸ“š References

Datasets & Labels

External Benchmarks & Comparative Works

Optimization & Methods


πŸ‘₯ Contributors & Acknowledgements

Final implementation and project coordination:

External methodology review:

Early-stage architectural exploration (ResNet / DenseNet / ConvNeXT. baseline phase):


πŸ›‘οΈ Independent Technical Review & Reproducibility Audit

An independent technical review and full reproducibility verification of XAND-Ray Baseline v0.1 was conducted by:

RubΓ©n J.R.
External Reviewer
πŸ“„ Independent Technical Review – v0.1 (14/02/2026)

Scope of the Review

  • Verification of proper train / validation / test separation (patient-level split)
  • Absence of data leakage or cross-dependencies
  • Independent full replication of the training pipeline from scratch
  • Reproduction of reported AUC-ROC metrics
  • Evaluation flow integrity (no test reuse or post-hoc bias)
  • Architectural consistency review (DenseNet121 integration and pipeline coherence)

Conclusion

The reported metrics were independently reproduced and remain within expected technical variation margins inherent to non-deterministic computational environments.

No structural inconsistencies, data leakage, or improper use of the test set were identified during the review.

⚠️ This audit is strictly limited to technical analysis and computational reproducibility. It does not constitute clinical validation or medical certification.



XAND-Ray β€” Chest X-Ray Screening Research.
πŸ’ͺ If you adapt it, extend it, or improve it β€” go for it.

Β© 2026 XOREngine Β· Open Source Commitment

About

Chest X-ray screening research with explicit distribution shift evaluation, per-disease binary models and validated Grad-CAM localization.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages