This repository contains a progressive series of exercises designed to teach computational methods and high-performance computing (HPC) concepts through a bike-sharing simulation model.
You will implement a stochastic simulation of a bike-sharing system between two stations (Mailly and Moulin). The simulation models probabilistic bike movements and tracks various metrics like unmet demand and station imbalances.
By completing these exercises, you will learn:
- Stochastic simulation fundamentals
- Serial vs parallel computation concepts
- Local multiprocessing with Python
- HPC cluster computing with SLURM
- Containerization for reproducible computing
- Data aggregation and visualization techniques
├── 1_basic_single_sim/ # Single simulation run
├── 2_serial_param_sweep/ # Serial parameter sweeps
├── 3_parallel_local/ # Local parallel processing
├── 4_cluster_slurm/ # SLURM cluster execution
└── 5_containers/ # Container deployment
- State: Tracks bike counts at Mailly and Moulin stations
- step(): Simulates one time step with probabilistic bike movements
- run_simulation(): Executes complete simulation and returns results
p1: Probability of movement from Mailly → Moulinp2: Probability of movement from Moulin → Maillysteps: Number of simulation time stepsseed: Random seed for reproducibility
flowchart TD
A(["Start"]) --> B["Initialize State (mailly, moulin, rng, params)"]
B --> C{"t < steps?"}
C -- yes --> D["step(): random draws with p1, p2; move bikes; track unmet demand"]
D --> E["Record time, mailly, moulin"]
E --> C
C -- no --> F["Compute metrics (unmet_mailly, unmet_moulin, final_imbalance)"]
F --> G["Return DataFrame + metrics"]
flowchart LR
P1["1_basic_single_sim<br/>run_single.py"] --> P2["2_serial_param_sweep<br/>run_serial.py"]
P2 --> P3["3_parallel_local<br/>run_parallel.py (ProcessPoolExecutor)"]
P3 --> P4["4_cluster_slurm<br/>ON HPC CLUSTER<br/>sweep_array.sbatch -> run_one.py -> results/<br/>collect_results.py"]
P4 --> P5["5_containers<br/>Create docker or singularity container<br/>(Dockerfile / velo.def)"]
Implement:
-
model.py:step(): Handle probabilistic bike movements and track unmet demandrun_simulation(): Run simulation loop and collect timeseries data
-
run_single.py:parse_args(): Parse command-line argumentsmain(): Execute simulation and save results
Test your implementation:
cd 1_basic_single_sim/
python run_single.py --steps 100 --p1 0.3 --p2 0.2 --init-mailly 10 --init-moulin 5 --out-csv results.csv --plotImplement:
model.py: Same as Phase 1run_serial.py:- Process multiple parameter combinations from CSV
- Aggregate results across runs
- Generate comparative visualizations
Test your implementation:
cd 2_serial_param_sweep/
python run_serial.py --params params.csv --out-dir results/ --plotImplement:
model.py: Extended with unmet metricsrun_*.py:main(): Implement logic for parallel execution- Use
threads,multiprocessingandmpifor CPU-bound parallelization - Handle result collection from multiple processe
Test your implementation:
cd 3_parallel_local/
python run_parallel.py --params params.csv --out-dir results/ --workers 4 --plotWe will not implement this phase
run_one.py: Execute single simulation from parameter file rowcollect_results.py: Aggregate distributed resultssweep_array.sbatch: Configure SLURM array job
Test your implementation:
cd 4_cluster_slurm/
# Submit array job
sbatch sweep_array.sbatch
# After completion, collect results
python collect_results.py --in-dir results/ --out-dir aggregated/ --plotEach phase should produce:
- Metrics: CSV files with bike counts over time
- Visualizations: Plots showing simulation results
- Aggregated results: Combined data from multiple runs in Metrics file.
- mailly_counts: Bike counts at Mailly station
- moulin_counts: Bike counts at Moulin station
unmet_mailly: Unmet demand at Mailly stationunmet_moulin: Unmet demand at Moulin stationfinal_imbalance: Final difference in bike counts between stations
- Start simple: Implement Phase 1 completely before moving to Phase 2
- Test incrementally: Verify each function works before proceeding
- Use small parameters: Test with small step counts initially
- Check data formats: Ensure CSV outputs match expected structure
- Debug with prints: Add logging to understand simulation behavior
rng = np.random.default_rng(seed)
if rng.random() < probability:
# Execute actionInstead of dictionary, you can use pandas.DataFrame that might facilitate plotting.
# Accuamulate the time and count data in lists and construct the dataframe
df = pd.DataFrame({
"time": times,
"mailly": mailly_counts,
"moulin": moulin_counts
"unmet_mailly": unmet_mailly,
"unmet_moulin": unmet_moulin
})Using dictionary
metrics = {
"mailly",
"moulin",
"unmet_mailly": 0,
"unmet_moulin": 0,
"final_imbalance": 0
}- Import errors: Ensure you're in the correct directory
- Missing dependencies: Install required packages (pandas, numpy, matplotlib)
- File not found: Check file paths and create output directories
- SLURM issues: Verify cluster access and module availability
Required Python packages:
numpy: Numerical computations and random number generationpandas: Data manipulation and CSV I/Omatplotlib: Plotting and visualizationconcurrent.futures: Parallel processing (built-in)threading: Multithreading (built-in)multiprocessing: Local multiprocessing (built-in)mpi4py: MPI support (optional)
Install with:
pip install numpy pandas matplotlib- Clone this repository
- Install dependencies
- Start with Phase 1 (
1_basic_single_sim/) - Read the docstrings carefully for implementation guidance
- Test each function individually before making a Pull request
- Progress through phases sequentially
Good luck with your implementation!