Predicting Species and Structural Diversity of Temperate Forests with Satellite Remote Sensing and Deep Learning
This repository provides the codes and datasets used for the paper submission to the Remote Sensing Journal and for the Bachelor Thesis of Janik Hoffmann on "Predicting Species and Structural Diversity of Temperate Forests with Satellite Remote Sensing and Deep Learning".
Based on forest inventory data from the Biodiversity Exploratories we built spatial models of biodiversity indicators (tree species diversity and the standard deviation of tree diameter) using Deep Neural Networks and Sentinel-1 and -2 image metrics. Our work contributes to current research by testing a novel approach for the regression analysis of in-situ forest biodiversity and satellite observations based on a heterogeneous dataset covering different environmental and forest management conditions throughout Germany.
- Gathering of field data on selected forest variables and calculation of Shannon's Diversity Index from forest composition dataset
- Sentinel-2 Preprocessing of Surface Reflectance satellite data and extraction of plot statistics
- Sentinel-1 Preprocessing and extraction of plot statistics
- Computation of further image metrics
- Setup of the DNN
- Model validation and variable importance
- Test for spatial autocorrelation
- Applying the model on raster data
Forest data has been accessed via the Biodiversity Exploratories Information System (BExis):
The study sites:
- Schorfheide-Chorin (Brandenburg)
- Hainich-Dün (Thuringia)
- Swabian Alb (Baden-Wuerttemberg)
Dataset No. | Description | Period |
---|---|---|
22766 | standard deviation of tree diameter (DBH_sd) | 2014-2018 |
tree basal area per hectare (BA) | ||
Reineke's Stand Density Index (SDI) | ||
22907 | abundance of individuals for each tree species | 2014-2018 |
19986 | standard deviation of tree height (h_sd) | 2014 |
17706 | forest type (dominant species, management) | 2008-2014 |
Calculation of Shannon's Diversity Index
As a measure of tree species diversity the Shannon Index has been calculated based on the species composition dataset. For that, the Diversity function from the Python library EcoPy has been used.
Optical satellite data has been obtained from the Sentinel-2 Surface Reflectance archive in Google Earth Engine. Cloud masking represents an elemental preprocessing step to make the satellite data analysis-ready. We used the s2cloudless algorithm that assigns cloud probability values to each pixel for masking out clouds and cloud shadows. For each of the three study sites, Sentinel-2 composites covering images from the growing season (March-Oct.) of 2017. For all 150 plot areas band statistics have been extracted and stored in a csv. file.
Radar data has been obtained from the collection of C-Band Sentinel-1 SAR GRD in Google Earth Engine. We computed different backscatter image products for the whole year 2017 and the winter season respectively. aWe extracted band statistics for the location of the plots and stored it in a csv.file.
Based on the evaluation of previous studies additional model variables, besides the raw band data, have been extracted from the satellite imagery. In total, a number of 31 predictors have been used for modeling.
We wanted to check the predictive power of the EVI computed based on Sentinel-2 image data in Google Earth Engine.
In addition, as a measure of spectral diversity the Rao's Q index has been calculated from Sentinel-2 composite using the tool Rao's Q Diversity Index in ArcGIS.
Based on the Sentinel-2 EVI composite and a composite of Sentinel-1 showing the normalized backscatter of VH and VV for winter period, four image texture metrics (entropy, dissimilarity, homogeneity, contrast) have been calculated in Google Earth Engine using the GLCM function.
For modeling the biodiversity variables, we used a feed-forward-deep-neural network implemented via Keras sequential model in Python. As predictors Sentinel-1 and -2 composites, as well as further computed metrics have been used. The predictors have been divided into different groups: Sentinel-2 bands+EVI+Q, Sentinel-1 backscatter, Sentinel-2 texture and Sentinel-1 texture.
The model validation has been based on a set of common accuracy metrics that measure the correlation between predicted and in-situ values of the target variable (coefficient of determination r2) and the difference between the two (root-mean-squared error RMSE, relative-root-mean-squared error RRMSE). Furthermore, the variable importance has been calculated for each predictor based on model runs with a specific group of predictors.
Spatial autocorrelation is a common phenomena when it comes to spatial models with remote sensing and indicates spatial dependence between model training and validation data. We accounted for this problem by calculating the Moran's I index in R.
We applied the calibrated model of structural diversity to raster data to assess the performance of the model outside the test areas. We then recorded patterns for patches of known forest type to assess the model's capability to generalize across different species and forest management regimes.