Hybrid methodology for numerical simulation and ecological risk classification
2D advection–diffusion model + machine learning to map risk zones (Low, Medium, High)
- Overview
- Features
- Installation & Setup
- Quick Start
- Usage Guide
- Visualizations
- API Documentation
- Data Formats
- Project Architecture
- Mathematical Model
- Dataset Structure
- Performance Benchmarks
- Contributing
- Research Team
- Industry Partners Supporting Innovation
- Scientific References
- Citation & License
- Acknowledgments
- Contact
- FAQ
This repository implements a hybrid methodology to assess ecological risk associated with contaminants in water bodies (e.g., rivers and channels), integrating:
- Numerical simulation of transport (2D advection–diffusion model with decay) to generate concentration fields over the domain.
- Feature engineering from results (spatial, temporal, and hydrodynamic).
- Machine learning to classify risk into three levels: Low (0), Medium (1), High (2).
- Visualizations (maps, confusion matrices, feature importance, and metrics dashboards).
- Scenario-based execution to build diverse and reproducible datasets.
- 🧮 Numerical model: explicit finite-difference scheme with stability checks (CFL and diffusion).
- 🔬 Scenarios: multiple predefined scenarios (velocities/discharges and source positions) configurable via YAML.
- 🤖 Risk classification: Random Forest, SVM, Gradient Boosting, Logistic Regression; cross-validation and hyperparameter tuning.
- 🎨 Reports and plots: comparative dashboards, detailed confusion matrices, and feature-importance ranking.
- 💾 Export: outputs in NPY and CSV for interoperability (Excel/R/MATLAB/Python).
- Solves the 2D advection–diffusion equation with first-order decay.
- Supports Dirichlet, Neumann, and mixed boundary conditions.
- Models sources with configurable location, intensity, and duration.
- Stores time history and final state for analysis and training.
- Supports multiple algorithms and selects the best via cross-validation.
- Hyperparameter tuning with GridSearchCV.
- Two feature sets:
- Fundamental (8): base parameters (source, velocities, position, normalized time).
- Complete (16): fundamental + derived variables (distances, travel times, Péclet numbers, etc.).
- GIFs and snapshots for spatio-temporal evolution of concentration and risk.
- Model comparison plots and metrics dashboards.
- Confusion matrix (absolute and normalized) for detailed inspection.
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8+ | 3.9+ |
| RAM | 8 GB | 16 GB+ (if exporting full history) |
| CPU | 4 cores | 8+ cores |
| Storage | 2 GB | 10 GB+ (datasets + results) |
| OS | Windows/Linux/macOS | Linux (better for large batches) |
# Scientific computing
numpy>=1.21.0
pandas>=1.3.0
scipy>=1.7.0
# Machine learning
scikit-learn>=1.0.0
joblib>=1.1.0
# Visualization
matplotlib>=3.5.0
seaborn>=0.11.0
# Utilities
PyYAML>=6.0
tqdm>=4.62.0# Method 1: direct install
git clone https://github.com/gstinoco/mGFD_EcoRisk_Simulator.git
cd mGFD_EcoRisk_Simulator
pip install -r requirements.txt
# Method 2: virtual environment (recommended)
python -m venv contaminant_env
source contaminant_env/bin/activate # Windows: contaminant_env\Scripts\activate
pip install -r requirements.txtpython -c "import numpy, pandas, sklearn, matplotlib, seaborn, yaml; print(':white_check_mark: OK')"
python main.py --help| Step | What to do |
|---|---|
| 1) Install |
|
| 2) Run |
|
| 3) Check outputs |
Simulations: data/simulations/Dataset: data/processed/Metrics / model: data/results/Visualizations: data/visualizations/ (or docs/ for demos)
|
# Full pipeline (complete features = default)
python main.py --complete
# Full pipeline using only fundamental features (8)
python main.py --complete --fundamental-features
# Simulate a specific scenario
python main.py --simulate --scenario baseline_left_center
# Simulate all scenarios defined in config/parameters.yaml
python main.py --simulate --all-scenarios
# Preprocess, train, and visualize (separately)
python main.py --preprocess
python main.py --train
python main.py --visualize
# Generate GIFs and snapshots
python main.py --create-videos
python main.py --create-snapshots --snapshots-count 6Practical workflows for simulation, preprocessing, training, and visual analysis
| Step | What it does |
|---|---|
| 1) Configure | Edit config/parameters.yaml (domain, physics, source, boundaries, scenarios). |
| 2) Run | |
| 3) Outputs | NPY/CSV are saved to data/simulations/<scenario>/. |
| Step | What it does |
|---|---|
| 1) Run batch | |
| 2) Dataset | Scenarios create diversity (source position, flow/discharge). |
| Step | What it does |
|---|---|
| 1) Run |
|
| 2) Outputs |
data/processed/X_train_*.npy, X_test_*.npy, y_train_*.npy, y_test_*.npyand feature_names_*.txt
|
| Step | What it does |
|---|---|
| 1) Train |
|
| 2) Evaluation | Computes per-model metrics and saves the best classifier. |
| 3) Outputs | data/results/ (metrics CSV and classifier PKL). |
| Step | What it does |
|---|---|
| 1) Visualize |
|
| 2) Outputs | data/visualizations/ with PNGs (model comparison, confusion matrix, feature importance, dashboard). |
This project is primarily used as a command-line tool (CLI) via main.py.
| Entry point | Command | Purpose |
|---|---|---|
main.py |
python main.py --complete |
Runs the full flow (simulation → dataset → training → visualization) |
main.py |
`python main.py --simulate [--scenario | --all-scenarios]` |
main.py |
python main.py --preprocess [--fundamental-features] |
Builds the ML dataset |
main.py |
python main.py --train [--fundamental-features] |
Trains models and saves the best one |
main.py |
python main.py --visualize [--fundamental-features] |
Generates plots/dashboards from results |
main.py |
python main.py --create-videos |
Generates time-evolution GIFs |
main.py |
python main.py --create-snapshots --snapshots-count N |
Generates snapshots at selected times |
For full help:
python main.py --helpEach scenario stores (at minimum) the following in data/simulations/<scenario>/:
final_concentration.npy: final fieldC(x,y,t_final)concentration_history.npy: time history (can be large)x_coordinates.npy,y_coordinates.npy: spatial axestimes.npy: time vectorparameters.yaml: effective parameters used (base + scenario overrides)
If output.export_csv: true in config/parameters.yaml, it also exports:
final_concentration.csvcoordinates.csvtimes.csvconcentration_history.csv(only ifoutput.csv_include_history: true, can be very large)
In data/processed/ it stores (with compatibility suffixes):
- Complete features (16):
*_complete.npyandfeature_names_complete.txt - Fundamental features (8):
*_fundamental.npyandfeature_names_fundamental.txt
Targets:
y_*: risk labels0/1/2for (Low/Medium/High).
.
├─ config/
│ └─ parameters.yaml # Model, ML, visualization, and scenario parameters
├─ data/
│ ├─ simulations/ # Per-scenario outputs (NPY/CSV + parameters)
│ ├─ processed/ # ML matrices (X/y + feature names)
│ └─ results/ # Metrics and trained models (CSV/PKL)
├─ docs/
│ ├─ images/ # Dashboards and snapshots
│ ├─ videos/ # Concentration and risk GIFs
│ └─ logo/ # Logos
├─ src/
│ ├─ numerical_model/ # Advection–diffusion equation (FD)
│ ├─ ml_model/ # Preprocessing and risk classifier
│ └─ visualization/ # Plots, dashboards, GIFs and snapshots
└─ main.py # CLI and full-flow orchestration
Core components:
src/numerical_model/advection_diffusion.py: 2D transport solver.src/ml_model/data_preprocessing.py: feature extraction + risk labels.src/ml_model/risk_classifier.py: model training/evaluation/persistence.src/visualization/visualization.py: visualization and export.
Contaminant transport is modeled with the 2D advection–diffusion equation with decay:
Where:
C: contaminant concentration [mg/L]u, v: advection velocities [m/s]D: diffusion coefficient [m²/s]S: source (injection) [mg/(L·s)]k: decay rate [1/s]
Configurable in config/parameters.yaml as:
- Dirichlet: fixed concentration at the boundary.
- Neumann: fixed gradient/flux (open outflow).
- Mixed: per-side combination.
The solver prints typical checks:
- CFL condition for advection.
- Stability condition for diffusion.
If violated, adjust dt, dx, dy, or physical parameters in the configuration.
This project can operate as:
- A dataset generator (scenario-driven) from simulations.
- An ML pipeline for risk classification using previously generated datasets.
Main layout:
data/
├─ simulations/
│ ├─ baseline_left_center/
│ ├─ baseline_lower/
│ └─ baseline_upper/
├─ processed/
│ ├─ X_train_complete.npy
│ ├─ X_test_complete.npy
│ ├─ y_train_complete.npy
│ ├─ y_test_complete.npy
│ ├─ feature_names_complete.txt
│ ├─ X_train_fundamental.npy
│ ├─ X_test_fundamental.npy
│ ├─ y_train_fundamental.npy
│ ├─ y_test_fundamental.npy
│ └─ feature_names_fundamental.txt
└─ results/
├─ all_models_metrics_report.csv
├─ all_models_metrics_report (fundamental features).csv
├─ risk_classifier_model.pkl
└─ risk_classifier_model (fundamental features).pkl
Reference results (files in data/results/):
| Feature set | Best model (Accuracy) | File |
|---|---|---|
| Complete (16) | GradientBoosting (0.9997) | all_models_metrics_report.csv |
| Fundamental (8) | GradientBoosting (0.9893) | all_models_metrics_report (fundamental features).csv |
Note: Results depend on configuration, sampling, and the available scenarios.
- Search existing issues: Check if the bug has already been reported
- Create a detailed report: Include steps to reproduce and expected vs actual behavior
- Provide context: Operating system, Python version, browser, and relevant parameters (image size, regions, method)
- Describe the feature: Clear and concise description of the proposed functionality
- Justify the need: Explain how it benefits research, reproducibility, or usability
- Provide examples: Use cases, expected inputs/outputs, and acceptance criteria
git clone https://github.com/gstinoco/mGFD_EcoRisk_Simulator.git
cd mGFD_EcoRisk_Simulator
python -m venv dev_env
source dev_env/bin/activate # On Windows: dev_env\Scripts\activate
pip install -r requirements.txt
git checkout -b feature/your-feature-name| Photo | Student | Institution | Contact |
|---|---|---|---|
|
Gabriela Pedraza-Jiménez |
|
|
|
Eli Chagolla-Inzunza |
|
|
| Photo | Student | Institution | Contact |
|---|---|---|---|
|
Jorge L. González-Figueroa |
|
|
|
Christopher N. Magaña-Barocio |
|
|
| Photo | Student | Institution | Contact |
|---|---|---|---|
|
|
Maria Goretti Fraga-Lopez |
|
|
Collaboration between academia and industry to accelerate real-world impact
|
🎯 Focus areas:
|
- Tinoco-Guerrero, G., Domínguez-Mota, F. J., Guzmán-Torres, J. A., & Tinoco-Ruiz, J. G. (2022). "Numerical Solution of Diffusion Equation using a Method of Lines and Generalized Finite Differences." Revista Internacional de Métodos Numéricos para Cálculo y Diseño en Ingeniería, 38(2). DOI: 10.23967/j.rimni.2022.06.003
- Contour-to-cloud pipeline: interactive image-based contour extraction and multi-region management
- Cloud generation methods: Regular (grid-like) and Natural (Poisson disk sampling) distributions
- Region-aware analysis: neighbor computation constrained by region labels for disconnected domains and holes
If you use this software in your research, please cite:
@software{tinoco2025mGFD_cloudgenerator,
title={mGFD CloudGenerator 2.0: Web platform for generating 2D unstructured point clouds},
author={Tinoco-Guerrero, Gerardo and
Domínguez-Mota, Francisco Javier and
Guzmán-Torres, José Alberto and
Arias-Rojas, Heriberto},
year={2025},
institution={Universidad Michoacana de San Nicolás de Hidalgo},
organization={SIIIA MATH: Soluciones en ingeniería},
url={https://github.com/gstinoco/mGFD_EcoRisk_Simulator},
version={2.0},
note={Web-based preprocessing tool for meshless mGFD workflows: image-to-contour extraction, multi-region handling, point-cloud generation (regular/Poisson), node classification, and region-constrained neighbor analysis}
}This project is licensed under the MIT License - see the full license text below:
MIT License
Copyright (c) 2025 Gerardo Tinoco-Guerrero, Francisco Javier Domínguez-Mota,
José Alberto Guzmán-Torres, Heriberto Árias Rojas
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Academic Use: This software is developed for research and educational purposes. Commercial use is permitted under the MIT License terms.
We extend our gratitude to the institutions and partners supporting this research and open-source development
|
Primary Contact
Research group coordination Dr. Gerardo Tinoco Guerrero Morelia, Michoacán, México |
||||||
|
Technical Support
Bug reports, questions, and collaboration requests
|
||||||
|
Collaboration Opportunities
Research and engineering partnerships
|
||||||
|
Student Opportunities
Projects and training in scientific computing
|
||||||
|
Institutional Affiliations
|
Which image formats are supported?
PNG, JPG/JPEG, GIF, and BMP. Maximum request size is 16 MB.
Where are outputs saved when running locally?
Generated files are written to
output/. Uploaded files are stored in uploads/. Both folders are created automatically on startup.
What is the expected CSV format for CloudGenerator?
Contours:
x,y,region (region is optional, but recommended for multi-region). Clouds: x,y,region,classification.
Can I use this in commercial projects?
Yes. The project is released under the MIT License.
How should I cite this work?
Use the BibTeX entry in the Citation section and the referenced DOI in Scientific References.













