Processing and Analysis Pipeline for
Fluorescence Microscopy Imaging Data
> Single particle tracking (mRNA) and translocation through NPCs in S. cerevisiae
Production-quality pipeline for automated nuclear envelope morphology analysis from dual-channel fluorescence microscopy with statistical rigor and FAIR compliance.
This pipeline processes and analyzes imaging data collected using fluorescence microscopy into quantitative measurements of the nuclear pore complex (NPC) via labeling of pairs of nucleoporins (Nups) with sub-pixel precision. Originally developed for studying translocation of mRNA through NPC in yeast, the modular architecture supports diverse membrane analysis applications.
Key Innovation: Systematic transformation from research prototype to production-ready, FAIR-compliant software demonstrating research software engineering best practices.
Documentation Status: Complete Configuration Reference and Bibliography available now. Additional user guides and tutorials in development.
- ✅ Automated NE Detection - U-Net++ neural network segmentation with adaptive thresholding
- ✅ Sub-pixel Refinement - spline fitting via intensity profile modeled using a Richards-Gaussian model fit through non-linear optimization (Levenberg-Marquardt algorithm)
- ✅ Multi-channel Registration - Phase correlation with sub-pixel accuracy
- ✅ Statistical Rigor - Likelihood ratio testing, AIC model selection, robust outlier filtering
- ✅ Quality Control - Comprehensive validation with automated PDF reports
- ✅ Reproducible Analysis - Complete parameter justification with citations
- ✅ Modular Architecture - Object-oriented design with clear separation of concerns
- ✅ Checkpoint/Resume - FOV-level checkpoints allow for distributed HPC computing
- ✅ FAIR Compliance - Findable, Accessible, Interoperable, Reusable
- ✅ Production Quality - Adaptive algorithms reduced optimization failures
Create a JSON configuration file (see Configuration Guide for complete reference):
{
"pipe globals": {
"strains": ["experiment_001"],
"directories": {
"imaging root": "/path/to/your/data",
"camera root": "/path/to/calibration/images",
"output root": "/path/to/output",
"model root": "./trained_models"
},
"pixelsize": 128,
"frame_duration": 0.02
},
"image processor": {
"model_NE": "Modelweights_NE_segmentation.pt",
"responsivity": {
"ch1": {
"bright": "bright_Ch1.tif",
"dark": "dark_Ch1.tif"
},
"ch2": {
"bright": "bright_Ch2.tif",
"dark": "dark_Ch2.tif"
}
}
}
}Pipeline generates: - PDF Reports - Visual quality control and registration statistics - Refined Splines - Sub-pixel accurate nuclear envelope boundaries - Distance Measurements - Quantified separation between fluorescent labels - Quality Metrics - Comprehensive validation statistics
Output structure:
output_root/
├── responsivity/ # Camera calibration results
├── initial_fit/ # U-Net++ detection and initial splines
├── registration/ # Multi-channel alignment + PDF reports
├── refined_fit/ # Sub-pixel refined splines
└── distances/ # Dual-label distance measurements
Complete Configuration Reference - Every parameter documented with citations - Scientific justification for defaults - When to change settings - Example configurations for different use cases - Validation and troubleshooting
Complete Bibliography - 50+ peer-reviewed citations - Organized by research context - Cross-referenced by parameter - Statistical method justifications
- Installation guide
- User guide (data preparation, running pipeline, troubleshooting)
- Methodology documentation (detailed algorithm descriptions)
- Developer guide (architecture, contributing, testing)
- API reference
Still undergoing testing on multiple systems. The singularity recipe is provided.
- Local: macOS (Apple Silicon M4 Max)
- HPC: LSF cluster
- Camera Calibration
- Derives gain, offset, read noise per channel
- Enables accurate photon counting
- Initial NE Detection
- U-Net++ neural network segmentation (Zhou et al. 2018)
- Initial B-spline fitting to boundaries
- Segment merging for continuous envelopes
- Multi-channel Registration
- Phase correlation with sub-pixel precision (Guizar-Sicairos et al. 2008)
- Quality control filtering (MAD-based robust statistics)
- Drift correction and stability analysis
- Spline Refinement
- Richards-Gaussian profile fitting (Richards 1959; Zhang et al. 2007)
- Levenberg-Marquardt optimization with adaptive step sizing
- Likelihood ratio test for outlier detection (Smith et al. 2010)
- AIC-based model selection (Akaike 1974)
- Bezier Bridging (Optional)
- Interpolates gaps in segmented data
- Creates continuous, periodic splines
- Maintains biological curvature constraints (Zimmerberg & Kozlov 2006)
- Distance Calculation
- IoU-based label pairing between channels
- Perpendicular distance sampling (1000 points per nucleus)
- Statistical distance metrics (mean, std, distribution)
Raw Images → Camera Calibration → NE Detection → Registration
↓ ↓
Quantified Aligned Channels
Photons ↓
Spline Refinement
↓
Distance Calculation
This pipeline enables quantitative analysis of: - Nuclear pore complex (NPC) spatial organization - Protein co-localization at nuclear membranes - Membrane morphology in disease models - mRNA transport dynamics through NPCs
- Sub-pixel Localization: protein positioning relative to membrane
- Statistical Rigor: LRT-based quality control ensures data validity
- Reproducibility: Complete parameter justification enables method validation
- Scalability: Checkpoint/resume enables analysis of 100+ nuclei per experiment
| Aspect | Research Prototype | Production Pipeline | Improvement |
|---|---|---|---|
| Code Duplication | ~1,130 lines | 0 lines | 100% reduction |
| Parameter Citations | 0/78 | 78/78 | 100% |
| Configuration | 12+ variants | 1 standard | Unified |
| Statistical Rigor | Arbitrary thresholds | Citation-backed methods | Validated |
See Evolution Document (in development) for detailed technical comparison.
- Proof-of-concept demonstrating algorithmic feasibility
- Validated approach on biological data
- Generated initial publication results
Systematic transformation for open-source release:
Architecture: - Monolithic scripts → Modular object-oriented design - Hard-coded parameters → JSON configuration system - Minimal docs → Comprehensive user/developer guides
Statistical Rigor: - Arbitrary thresholds → Citation-backed methods - No justifications → peer-reviewed citations - Fixed parameters → Adaptive optimization
Reproducibility: - "Works on my machine" → FAIR-compliant - No checkpointing → FOV-level resume capability - Manual validation → Automated quality control - Scattered code → DRY principles
This evolution demonstrates research software engineering expertise: transforming exploratory code into sustainable, community-ready tools.
If you use this pipeline in your research, please cite:
@software{ne_pipeline_2025,
author = {Petitto, Jocelyn},
title = {Pipeline for the Analysis of Fluorescence Microscopy Imaging Data with respect single particle track},
year = {2025},
version = {2.0.0}
}For specific methods, see Bibliography for complete citations.
See Configuration Guide for complete example configurations including:
- Dual-label analysis - Standard workflow configuration
- Cluster computing - HPC batch processing setup
Each example includes complete JSON configuration with explanations.
- CPU-bound: Spline refinement optimization
- I/O-bound: Large dataset loading
- GPU-accelerated: U-Net++ inference (10× speedup)
- Enable GPU for U-Net++ (automatic if detected by config.py)
- Use checkpoint/resume for cluster computing
- Reduce
frames_per_averagefor faster testing - Process experiments in parallel with job arrays
"CUDA not available" - Verify GPU drivers installed - Install CUDA-compatible PyTorch - Pipeline runs on CPU if GPU unavailable (slower but functional)
"Directory not found" - Check strains names match actual folder names -
Verify all paths in configuration are absolute - Ensure experiment folders
follow expected structure
"Registration failed for FOV_XXXX" - Normal for 5-10% of FOVs (stage drift, focus issues) - Pipeline continues processing remaining FOVs - Check PDF report for quality metrics
Pipeline crashes mid-refinement - Use checkpoint/resume: rerun same command - Pipeline automatically resumes from last completed FOV - Check available RAM (refinement memory-intensive)
- Configuration Questions: See Configuration Guide
- Method/Citation Questions: See Bibliography
- Bug Reports: Open an issue on GitHub
- Direct Contact: jocelyn.tourtellotte@umassmed.edu
We welcome contributions! Areas of particular interest:
- Testing: Additional unit/integration tests
- Documentation: Usage examples, tutorials, walkthroughs
- Validation: Benchmark datasets with ground truth
- Extensions: 3D imaging support, additional profile models
- Performance: GPU acceleration, parallel processing optimization
Contribution guidelines and code of conduct in development.
- ✅ Core Pipeline: Stable, production-ready
- ✅ Dual-Label Analysis: Fully implemented and tested
- ✅ Checkpoint/Resume: Robust FOV-level checkpointing
- ✅ Quality Control: Automated PDF reporting
- 🚧 Quality Assurance: - Automated testing with code coverage tracking and regression testing
- 🚧 mRNA Tracking: In development (particle detection implemented)
Architecture: - Object-oriented design with single responsibility principle - Dependency injection for testability - Configuration-driven (no hard-coded parameters) - Modular components for reusability
Reproducibility: - Complete parameter documentation - FAIR data principles - Version control with semantic versioning - Deterministic results (fixed random seeds where applicable)
HPC Optimization: - Checkpoint/resume for long-running jobs - Batch processing with job arrays - Memory-efficient streaming for large datasets - Parallel-ready architecture
Decision parameters have, where applicable: 1. Citation(s) - Peer-reviewed scientific justification 2. Validation - Tested on real biological data 3. Alternative analysis - Why this method over others 4. Parameter sensitivity - When to adjust defaults
Example: Outlier Detection - Method: Likelihood Ratio Test with AIC (Smith et al. 2010) - Replaces: Hard-coded 5% threshold (arbitrary) - Decision rule: ΔAIC > 2 indicates significant model improvement
See Bibliography for complete citation list.
Institution: - RNA Therapeutics Institute, University of Massachusetts Chan Medical School
Collaborators: - - Grunwald Lab (microscopy infrastructure)
Original Proof-of-Concept: - Graduate student research (2021-2023) demonstrated algorithmic feasibility
Software Engineering: - Systematic refactoring for open-source release (2024-2025) - Transformation from research prototype to production software
This project is licensed under the MIT License - see LICENSE file for details.
- ✅ Free for academic and commercial use
- ✅ Modification and distribution permitted
- ✅ Attribution required
- ✅ No warranty (use at your own risk)
- Documentation: See
docs/directory in repository
- Complete production refactoring
- Modular object-oriented architecture
- Comprehensive testing suite
- Statistical rigor (LRT, AIC, robust methods)
- FAIR compliance
- Checkpoint/resume system
- Complete documentation
- Research prototype
- Proof-of-concept implementation
- Initial results
Maintainer: Jocelyn Petitto (formerly Tourtellotte)
Email: jocelyn.petitto@gmail.com
Institution: RNA Therapeutics Institute, UMass Chan Medical School
ORCID: [Add your ORCID ID]
Last Updated: January 2, 2026
Pipeline Version: 2.0.0
Python Version: 3.11+
This pipeline demonstrates research software engineering expertise: transforming exploratory research code into production-quality, FAIR-compliant tools that enable reproducible science and community adoption.