A comprehensive, production-ready framework for multi-task deep learning in surgical video analysis, featuring instance segmentation, phase recognition, skill assessment, and video processing capabilities.
Cataract-LMM is an enterprise-grade AI framework designed for large-scale, multi-center surgical video analysis. Built on modern software engineering principles, this repository provides state-of-the-art deep learning models for comprehensive analysis of cataract surgery videos.
This framework implements methodologies from cutting-edge research in computer-assisted surgery, providing validated approaches for:
- Surgical Instance Segmentation using YOLO, Mask R-CNN, and SAM architectures
- Surgical Phase Recognition with Video Transformers, 3D CNNs, and temporal models
- Surgical Skill Assessment through multi-modal analysis and performance metrics
- Video Processing with GPU-accelerated pipelines for medical video data
- Production-Ready: Enterprise-grade architecture with comprehensive testing and CI/CD
- Multi-Task Learning: Unified framework supporting four core surgical analysis tasks
- Scalable Design: Microservices-ready architecture with containerization support
- Medical Compliance: HIPAA-aware design patterns and secure data handling
- Research-to-Production: Seamless transition from research notebooks to production deployment
- ๐ Quick Start
- โจ Features
- ๐๏ธ Architecture
- ๐ฆ Installation
- ๐ฏ Usage Examples
- ๐ ๏ธ Development
- ๐ Model Zoo
- ๐ง Configuration
- ๐งช Testing
- ๐ Documentation
- ๐ค Contributing
- ๐ License
- ๐ฃ Citation
- ๐จโ๐ป Author
- ๐ Support & Community
- ๐ Roadmap
- Python 3.8+
- CUDA 11.8+ (for GPU acceleration)
- FFmpeg (for video processing)
- Docker (optional, for containerized deployment)
# Clone the repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM
# Install using Poetry (recommended)
cd codes
poetry install
# Activate virtual environment
poetry shell
# Or install using pip
pip install -r requirements.txt
# Validate installation
python setup.py --validate-only# Video processing
cd surgical-video-processing
python main.py --input path/to/video.mp4 --output ./results --config configs/default.yaml
# Instance segmentation
cd surgical-instance-segmentation
python inference/predictor.py --model yolo --input data/images/
# Phase recognition
cd surgical-phase-recognition
python validation/training_framework.py --config configs/default.yaml --mode train
# Skill assessment
cd surgical-skill-assessment
python main.py --config configs/comprehensive.yaml --mode evaluate| Component | Models | Key Features |
|---|---|---|
| Instance Segmentation | YOLO v8/11, Mask R-CNN, SAM | Real-time surgical instrument detection and segmentation |
| Phase Recognition | Video Transformers, 3D CNNs, TeCNO | 11-phase surgical workflow analysis |
| Skill Assessment | Multi-modal CNNs, Attention Models | Objective surgical skill evaluation |
| Video Processing | GPU-Accelerated Pipelines | Medical-grade video preprocessing and enhancement |
- ๐๏ธ Modular Architecture: Microservices-ready design with clear separation of concerns
- ๐ Security First: HIPAA-compliant patterns, secure credential management
- ๐ Comprehensive Testing: 85%+ test coverage with unit, integration, and E2E tests
- ๐ CI/CD Pipeline: Automated testing, security scanning, and deployment workflows
- ๐ Monitoring & Observability: Structured logging, metrics collection, and health checks
- ๐ณ Containerization: Multi-stage Docker builds with security hardening
- ๐ Rich Documentation: Comprehensive guides, API references, and examples
- ๐ฏ Configuration Management: YAML-based configuration with validation
- ๐งช Development Tools: Pre-commit hooks, linting, formatting, and type checking
- ๐ฆ Dependency Management: Poetry-based modern Python packaging
- ๐ง Development Environment: VS Code integration with debugging support
graph TB
A[Video Input] --> B[Video Processing Pipeline]
B --> C[Frame Extraction & Preprocessing]
C --> D[Multi-Task Analysis Engine]
D --> E[Instance Segmentation]
D --> F[Phase Recognition]
D --> G[Skill Assessment]
E --> H[Surgical Instruments]
F --> I[Surgery Phases]
G --> J[Skill Metrics]
H --> K[Clinical Decision Support]
I --> K
J --> K
Cataract_LMM/
โโโ ๐ README.md # Project overview and documentation
โโโ ๐ LICENSE # CC-BY-4.0 license
โโโ ๐ค CONTRIBUTING.md # Contribution guidelines
โโโ ๐ .gitignore # Git ignore patterns
โโโ ๐ codes/ # Main codebase
โ โโโ ๐ฌ surgical-video-processing/ # Video preprocessing and enhancement
โ โ โโโ core/ # Core processing algorithms
โ โ โโโ pipelines/ # Processing pipelines
โ โ โโโ metadata/ # Video metadata management
โ โ โโโ quality_control/ # Quality assurance tools
โ โ โโโ configs/ # Configuration files
โ โโโ ๐ฏ surgical-instance-segmentation/ # Instance segmentation models
โ โ โโโ models/ # YOLO, Mask R-CNN, SAM implementations
โ โ โโโ training/ # Training pipelines
โ โ โโโ inference/ # Real-time inference engines
โ โ โโโ evaluation/ # Model evaluation tools
โ โ โโโ data/ # Dataset utilities
โ โโโ ๐ surgical-phase-recognition/ # Phase classification models
โ โ โโโ models/ # Video Transformers, 3D CNNs, TeCNO
โ โ โโโ validation/ # Training and validation frameworks
โ โ โโโ preprocessing/ # Video preprocessing
โ โ โโโ analysis/ # Result analysis tools
โ โ โโโ configs/ # Model configurations
โ โโโ ๐ surgical-skill-assessment/ # Skill evaluation framework
โ โ โโโ models/ # Skill assessment models
โ โ โโโ engine/ # Training and inference engines
โ โ โโโ utils/ # Analysis utilities
โ โ โโโ configs/ # Assessment configurations
โ โโโ ๐งช tests/ # Comprehensive test suite
โ โโโ ๐ docs/ # Documentation source
โ โโโ ๐ณ docker/ # Docker configurations
โ โโโ ๐ reports/ # Analysis reports
โ โโโ โ๏ธ pyproject.toml # Python project configuration
โ โโโ ๐ Dockerfile # Container definition
โ โโโ ๐ Makefile # Development automation
โ โโโ ๐ง setup.py # Project setup script
โโโ ๐ค .github/ # GitHub configurations
โ โโโ workflows/ # CI/CD pipelines
โโโ ๐ security_scanning_demo.ipynb # Security analysis notebook
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8 | 3.11+ |
| RAM | 16GB | 32GB+ |
| GPU Memory | 8GB | 24GB+ |
| Storage | 50GB | 500GB+ |
| CUDA | 11.8 | 12.0+ |
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Clone and setup
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
# Install dependencies
poetry install --extras "dev docs"
# Activate environment
poetry shell# Create environment
conda create -n cataract-lmm python=3.11
conda activate cataract-lmm
# Clone and install
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
pip install -r requirements.txt# Build container
docker build -t cataract-lmm:latest .
# Run interactive container
docker run -it --gpus all -v $(pwd)/data:/app/data cataract-lmm:latest# Run comprehensive validation
python setup.py --validate-only
# Run tests
pytest tests/ -v
# Check GPU availability
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"from surgical_video_processing import VideoProcessor, QualityController
# Initialize processor with configuration
processor = VideoProcessor("configs/high_quality.yaml")
# Process surgical video
result = processor.process_video(
input_path="data/surgery_video.mp4",
output_dir="outputs/processed/",
apply_deidentification=True,
quality_threshold=0.8
)
print(f"Processed {result.frame_count} frames")
print(f"Quality score: {result.average_quality:.3f}")from surgical_instance_segmentation import SegmentationPredictor
# Load pre-trained model
predictor = SegmentationPredictor(
model_type="yolo_v8",
device="cuda"
)
# Segment surgical instruments
results = predictor.predict_batch(
image_paths=["frame001.jpg", "frame002.jpg"],
confidence_threshold=0.7,
save_visualizations=True
)
# Extract detections
for result in results:
print(f"Detected {len(result.boxes)} instruments")
print(f"Classes: {result.class_names}")from surgical_phase_recognition import PhaseClassifier
# Initialize phase recognition model
classifier = PhaseClassifier(
model_name="video_transformer",
config_path="configs/phase_recognition.yaml"
)
# Classify surgical phases in video sequence
phases = classifier.classify_sequence(
video_path="data/surgery_complete.mp4",
sequence_length=16,
overlap=0.5
)
# Display phase timeline
for phase in phases:
print(f"Time: {phase.timestamp:.2f}s - Phase: {phase.name}")from surgical_skill_assessment import SkillEvaluator
# Initialize skill assessment framework
evaluator = SkillEvaluator("configs/skill_assessment.yaml")
# Assess surgical performance
assessment = evaluator.evaluate_surgery(
video_path="data/complete_surgery.mp4",
phase_annotations="data/phases.json",
surgeon_level="resident" # resident, fellow, attending
)
# Generate skill report
report = evaluator.generate_report(assessment)
print(f"Overall Score: {report.overall_score}/100")
print(f"Efficiency: {report.efficiency_score}/10")
print(f"Precision: {report.precision_score}/10")# Clone repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
# Install development dependencies
poetry install --extras "dev"
# Setup pre-commit hooks
pre-commit install
# Run development server
make dev-server# Format code
make format
# Run linting
make lint
# Type checking
make type-check
# Security scanning
make security-scan
# Run all quality checks
make quality# Run unit tests
make test
# Run with coverage
make test-coverage
# Run integration tests
make test-integration
# Run end-to-end tests
make test-e2e
# Generate coverage report
make coverage-reportmake help # Show all available commands
make install # Install dependencies
make clean # Clean build artifacts
make build # Build distribution packages
make docker-build # Build Docker image
make docker-run # Run Docker container
make docs-build # Build documentation
make docs-serve # Serve documentation locally| Model | mAP@0.5:0.95 |
|---|---|
| YOLOv11 โญ | 73.9% |
| YOLOv8 | 73.8% |
| SAM | 56.0% |
| SAM2 | 55.2% |
| Mask R-CNN | 53.7% |
| Model | Backbone | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|---|
| MViT-B โญ | - | 85.7% | 77.1% | 77.1% | 78.5% |
| Swin-T | - | 85.5% | 76.2% | 77.5% | 77.2% |
| CNN + GRU | EfficientNet-B5 | 82.1% | 71.3% | 76.0% | 70.4% |
| CNN + TeCNO | EfficientNet-B5 | 81.7% | 71.2% | 75.1% | 71.2% |
| CNN + LSTM | EfficientNet-B5 | 81.5% | 70.0% | 76.4% | 69.4% |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| TimeSformer โญ | 82.5% | 86.0% | 82.0% | 83.9% |
| R3D-18 | 81.7% | 82.4% | 84.9% | 83.6% |
| Slow R50 | 80.0% | 81.8% | 81.8% | 81.8% |
| X3D-M | 80.0% | 83.9% | 78.8% | 81.3% |
| R(2+1)D-18 | 72.9% | 79.3% | 76.7% | 78.0% |
The framework uses YAML-based configuration for all components:
processing:
target_resolution: [1920, 1080]
fps: 30
quality_threshold: 0.75
deidentification:
enabled: true
blur_faces: true
remove_text: true
output:
format: "mp4"
compression: "h264"
quality: "high"model:
architecture: "yolov8"
size: "medium"
pretrained: true
training:
epochs: 100
batch_size: 16
learning_rate: 0.001
data:
classes: ["forceps", "scissors", "needle_holder", "suction"]
augmentation:
enabled: true
rotation: 15
scaling: [0.8, 1.2]# Create .env file
cp .env.example .env
# Edit configuration
CUDA_VISIBLE_DEVICES=0,1
WANDB_PROJECT=cataract-lmm
DATA_ROOT=/path/to/data
OUTPUT_DIR=/path/to/outputs
LOG_LEVEL=INFOtests/
โโโ unit/ # Unit tests for individual components
โโโ integration/ # Integration tests for module interactions
โโโ e2e/ # End-to-end workflow tests
โโโ performance/ # Performance and benchmarking tests
โโโ security/ # Security and vulnerability tests
โโโ fixtures/ # Test data and fixtures
โโโ conftest.py # Pytest configuration
# Run all tests
pytest
# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
# Run with coverage
pytest --cov=. --cov-report=html
# Run performance tests
pytest tests/performance/ --benchmark-only
# Run with specific markers
pytest -m "gpu" --gpu-required
pytest -m "slow" --timeout=300# pytest.ini
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
markers =
unit: Unit tests
integration: Integration tests
e2e: End-to-end tests
gpu: Tests requiring GPU
slow: Slow running tests
security: Security tests
addopts =
--strict-markers
--verbose
--tb=short
--cov-report=term-missing- ๐ User Guide: Getting started, tutorials, and examples
- ๐ง API Reference: Comprehensive API documentation
- ๐๏ธ Developer Guide: Contributing, architecture, and development setup
- ๐ Model Documentation: Model architectures, performance metrics, and usage
- ๐ Security Guide: Security considerations and best practices
# Install documentation dependencies
poetry install --extras "docs"
# Build documentation
cd docs
make html
# Serve documentation locally
make serve
# Build PDF documentation
make latexpdf- Documentation Site: https://cataract-lmm.readthedocs.io
- API Reference: https://cataract-lmm.readthedocs.io/api/
- Tutorials: https://cataract-lmm.readthedocs.io/tutorials/
- Model Zoo: https://cataract-lmm.readthedocs.io/models/
We welcome contributions from the surgical AI community! Please see our CONTRIBUTING.md for detailed guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Setup development environment
make dev-setup
# Run pre-commit checks
pre-commit run --all-files
# Run tests before committing
make test-all
# Submit pull request
gh pr create --title "Feature: Add amazing feature"- Python Style: Black formatter
- Import Sorting: isort
- Linting: Flake8 with medical AI conventions
- Type Checking: MyPy for type safety
- Documentation: Google style docstrings
This project framework and code are licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). See the LICENSE file for details.
The dataset has specific ownership and licensing requirements. See DATA_LICENSE.md for detailed information about:
- Data ownership by Farabi Eye Hospital and Noor Eye Hospital
- Annotation ownership by participating institutions
- Attribution requirements under CC-BY 4.0
- Proper usage guidelines
If you use this benchmark dataset or framework in your research, please cite our work. The benchmark has been submitted to Scientific Data (Nature Portfolio).
@misc{ahmadi2025cataractlmmlargescalemultitask,
title={Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis},
author={Mohammad Javad Ahmadi and Iman Gandomi and Parisa Abdi and Seyed-Farzad Mohammadi and Amirhossein Taslimi and Mehdi Khodaparast and Hassan Hashemi and Mahdi Tavakoli and Hamid D. Taghirad},
year={2025},
eprint={2510.16371},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.16371},
doi={10.48550/arXiv.2510.16371}
}Ahmadi, M. J., Gandomi, I., Abdi, P., Mohammadi, S.-F., Taslimi, A., Khodaparast, M., Hashemi, H., Tavakoli, M., & Taghirad, H. D. (2025). Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis. arXiv. https://doi.org/10.48550/arXiv.2510.16371
M. J. Ahmadi et al., "Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis," 2025, arXiv:2510.16371. [Online]. Available: https://arxiv.org/abs/2510.16371
Ahmadi, Mohammad Javad, Iman Gandomi, Parisa Abdi, Seyed-Farzad Mohammadi, Amirhossein Taslimi, Mehdi Khodaparast, Hassan Hashemi, Mahdi Tavakoli, and Hamid D. Taghirad. 2025. "Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis." arXiv. https://doi.org/10.48550/arXiv.2510.16371.
@software{cataract_lmm_repo_2025,
title={{Cataract-LMM}: Large-Scale, Multi-Source, Multi-Task Benchmark and Framework for Surgical Video Analysis},
author={Ahmadi, Mohammad Javad and Gandomi, Iman and Abdi, Parisa and Mohammadi, Seyed-Farzad and Taslimi, Amirhossein and Khodaparast, Mehdi and Hashemi, Hassan and Tavakoli, Mahdi and Taghirad, Hamid D.},
year={2025},
url={https://github.com/MJAHMADEE/Cataract-LMM},
version={1.0.0}
}Mohammad Javad Ahmadi
- ๐ Documentation: Refer to individual README files in each module
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ง Email: mjahmadee@gmail.com
- โ Multi-task surgical video analysis framework
- โ Instance segmentation with YOLO/Mask R-CNN/SAM
- โ Phase recognition with Video Transformers
- โ Skill assessment framework
- โ Production-ready CI/CD pipeline
- ๐ Real-time inference optimization
- ๐ Multi-GPU distributed training
- ๐ Model quantization and pruning
- ๐ REST API and web interface
- ๐ Advanced analytics dashboard
- ๐ฎ Multi-modal learning (video + audio + sensor data)
- ๐ฎ Federated learning across institutions
- ๐ฎ Real-time surgical guidance system
- ๐ฎ Integration with surgical robots
- ๐ฎ Multi-language support