Seabed Classification using Integrated Acoustic Data and Machine Learning

Authors (of original research concept): Esraa E. Abouelmaaty, Obed Omane Okyere, José Manuel Echevarría Rubio Date (of original research): April 8th, 2025 Script Author/Maintainer: [b10nics]

Project Overview

This project implements a seabed classification workflow using integrated multibeam echosounder (MBES) data and machine learning algorithms.

The primary goal is to classify the seabed into distinct categories based on bathymetry, backscatter, and derived terrain features. This script automates the following processes:

Terrain Analysis: Calculation of morphological features (Slope, Aspect, TRI, TPI, Roughness, Hillshade) from a bathymetry GeoTIFF.
Data Alignment: Optional alignment of a backscatter GeoTIFF to the bathymetry grid.
Feature Engineering: Stacking of bathymetry, (aligned) backscatter, and terrain features into a multi-band GeoTIFF.
Ground Truth Integration: Extraction of feature values at ground truth point locations (from a CSV file), including reprojection to match the raster CRS.
Supervised Classification: Training and application of a Random Forest classifier, including hyperparameter tuning via GridSearchCV.
Unsupervised Classification: Application of K-Means clustering for an exploratory perspective.
Output Generation: Creation of GeoTIFF classification maps for both Random Forest and K-Means results.

Summary of Latest Run Results

The script successfully processed the example dataset (bathy_cube_10_filled_5x5.tiff, back_10_filled_5x5.tiff, ground_truth_samples_removed.csv):

Input Data:
- Bathymetry: 3299x1907 pixels, 10m resolution, UTM zone 15S.
- Backscatter: Successfully aligned to the bathymetry grid.
- Ground Truth: 292 points loaded and reprojected from EPSG:4326 to EPSG:32715.
Feature Generation:
- All terrain derivatives (Slope, Aspect, TRI, TPI, Roughness, Hillshade) were successfully generated.
- The final stacked raster for classification contained 7 features: Depth, Backscatter, Slope, Aspect, TRI, TPI, and Roughness.
Training Data:
- Feature values were extracted for all 292 ground truth points; no points were dropped due to NoData values.
- Classes were mapped to integers (e.g., 'Biogenic mat': 0, 'Lava flows': 4).
Random Forest Classification:
- The model was trained on 204 samples and tested on 88 samples.
- Best Parameters (GridSearchCV): {'class_weight': 'balanced', 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
- Test Set Performance:
  - Overall Accuracy: 77.27%
  - Kappa Coefficient: 0.727
- Feature Importances (Top 3):
  1. Depth: ~29.22%
  2. TPI: ~14.32%
  3. Backscatter: ~12.46%
- The full classification map (classification_rf.tif) was generated for 485,932 valid pixels.
K-Means Clustering:
- Data was scaled, and K-Means (k=7) was successfully applied.
- The K-Means classification map (classification_kmeans.tif) was generated.
Execution Time: Approximately 17-19 seconds.
Known Issues (from log):
- Plotting of the final classification maps encountered an error: "This method only works with the ScalarFormatter." This is a Matplotlib issue with the current colorbar formatter for discrete integer maps and does not affect the GeoTIFF output.
- A UserWarning from scikit-learn (X does not have valid feature names, but RandomForestClassifier was fitted with feature names) was observed during prediction. This is generally benign if the order and number of features are consistent, but ideally, prediction data should also be a DataFrame with matching column names.

Input Data Requirements

The script expects the following input files to be placed in the input_data directory (relative to the script's location):

Bathymetry File (bathy_file_name):
- Format: GeoTIFF (.tiff, .tif)
- Example: bathy_cube_10_filled_5x5.tiff
Backscatter File (backscatter_file_name): (Optional)
- Format: GeoTIFF (.tiff, .tif)
- Example: back_10_filled_5x5.tiff
Ground Truth CSV File (ground_truth_csv_name):
- Format: CSV (.csv)
- Required Columns: Longitude, Latitude, Class
- Example: ground_truth_samples_removed.csv

Output Files

The script generates output files in the Outputs_SeabedClassification directory:

Terrain Feature Rasters: slope.tif, aspect.tif, tri.tif, tpi.tif, roughness.tif, hillshade.tif.
Processed Backscatter: backscatter_aligned_to_bathy.tif.
Stacked Features: stacked_features_for_classification.tif.
Classification Maps: classification_rf.tif, classification_kmeans.tif.
Intermediate VRT files may also be present.

Software and Libraries

Python 3 with the following major libraries: GDAL/OGR, Rasterio, GeoPandas, Pandas, NumPy, Scikit-learn, Matplotlib. (See requirements.txt for a more detailed list).

Setup and Installation

Clone the repository.
Install GDAL: System-wide or via Conda is recommended (e.g., sudo apt install gdal-bin libgdal-dev python3-gdal or conda install -c conda-forge gdal).

Create a Python Virtual Environment:

python3 -m venv venv
source venv/bin/activate

Install Python Dependencies:
```
pip install -r requirements.txt
```
(Ensure requirements.txt reflects the necessary packages).
PROJ_LIB Environment Variable: The script attempts to set this. If CRS errors persist, ensure it points to your PROJ data directory (e.g., /usr/share/proj, /opt/conda/envs/your_env/share/proj).

How to Run

Prepare Data: Place input files in the input_data sub-directory.
Configure Script:
- Open Seabed_classification_mod.py.
- Verify data_dir_relative and output_dir_relative if your project structure differs.
- Adjust bathy_file_name, backscatter_file_name, ground_truth_csv_name if needed.
- Review other parameters like n_kmeans_clusters, rf_cv_folds, etc.
Execute:
```
python Seabed_classification_mod.py
```
Check Outputs: In the Outputs_SeabedClassification sub-directory.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Outputs_SeabedClassification		Outputs_SeabedClassification
input_data		input_data
LICENSE		LICENSE
README.md		README.md
Report_GeoStat.pdf		Report_GeoStat.pdf
Seabed_classification_mod.py		Seabed_classification_mod.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seabed Classification using Integrated Acoustic Data and Machine Learning

Project Overview

Summary of Latest Run Results

Input Data Requirements

Output Files

Software and Libraries

Setup and Installation

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Seabed Classification using Integrated Acoustic Data and Machine Learning

Project Overview

Summary of Latest Run Results

Input Data Requirements

Output Files

Software and Libraries

Setup and Installation

How to Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages