Combined Landsat and L-band SAR data improves land cover classification and change detection in dynamic tropical landscapes.
This repository contains the materials used for my land cover change analysis paper published in Remote Sensing journal on February 2018. The paper, both the main paper and supplementary material, is published open-access and can be downloaded for free under a CC BY 4.0 license. The materials provided in this repository serve to supplement the published paper by creating a backup of the materials used in and developed through the study, to encourage doing better science by fostering reproducibility and transparency, and to improve the overall impact of the research—both for other researchers and my future self.
Robust quantitative estimates of land use and land cover change are necessary to develop policy solutions and interventions aimed towards sustainable land management. Here, we evaluated the combination of Landsat and L-band Synthetic Aperture Radar (SAR) data to estimate land use/cover change in the dynamic tropical landscape of Tanintharyi, southern Myanmar. We classified Landsat and L-band SAR data, specifically Japan Earth Resources Satellite (JERS-1) and Advanced Land Observing Satellite-2 Phased Array L-band Synthetic Aperture Radar-2 (ALOS-2/PALSAR-2), using Random Forests classifier to map and quantify land use/cover change transitions between 1995 and 2015 in the Tanintharyi Region. We compared the classification accuracies of single versus combined sensor data, and assessed contributions of optical and radar layers to classification accuracy. Combined Landsat and L-band SAR data produced the best overall classification accuracies (92.96% to 93.83%), outperforming individual sensor data (91.20% to 91.93% for Landsat-only; 56.01% to 71.43% for SAR-only). Radar layers, particularly SAR-derived textures, were influential predictors for land cover classification, together with optical layers. Landscape change was extensive (16,490 km2; 39% of total area), as well as total forest conversion into agricultural plantations (3,214 km2). Gross forest loss (5,133 km2) in 1995 was largely from conversion to shrubs/orchards and tree (oil palm, rubber) plantations, and gross gains in oil palm (5,471 km2) and rubber (4,025 km2) plantations by 2015 were mainly from conversion of shrubs/orchards and forests. Analysis of combined Landsat and L-band SAR data provides an improved understanding of the associated drivers of agricultural plantation expansion and the dynamics of land use/cover change in tropical forest landscapes.
The following scripts were used for implementing image processing, classification, and accuracy assessment processes; for executing functions and statistical tests; and generating figures.
The overall workflow figure (Fig.2 in the paper) was designed using the yEd Graph Editor software, which uses an XML-based GraphML file format for graphs. The GraphML file used for generating the overall workflow figure is provided.
Scripts: yEd graphml for designing the overall workflow diagram.
Landsat image composites for 1995 and 2015 were generated by extracting the best available observations from multiple Landsat images using pixel-based image compositing, implemented through a user-defined rule-based criteria to generate the best-available-pixel image composites. (Note: 1995 Landsat image composite shown in Fig.1 of the paper.) To generate the best-available-pixel image composites using Landsat data, an image compositing script within the Google Earth Engine (Gorelick et al. 2017) environment was adapted and modified from the contributions of its user community, specifically the modified scripts by Mariano Gonzalez Roglich (Conservation International) and Juan Doblas (Instituto Socioambiental) on the original Temporal Dark Outlier Mask Compositing script developed by Carson Stam and Ian Housman (Red Castle Resources Inc; United States Forest Service—Remote Sensing Applications Center).
Scripts: GEE scripts for generating Landsat 1995 and Landsat 2015 image composites.
Once the image stacks of the combined Landsat and L-band SAR data and the regions-of-interest (ROI) polygons of land cover types were completed, the image values of all predictor variables (all data layers from the image stacks) within the delineated ROI polygons were extracted for backscatter/reflectance analysis. The image statistics were extracted using the Google Earth Engine platform and exported as csv files. The csv files were subsequently used as input data to generate box-whisker plots using ggplot2
package (Wickham 2016) in R software (R Core Team 2016) for the purpose visualising and analysing the distribution of SAR backscatter and Landsat TOA reflectance values for each predictor variable consisting of the image channels/bands, derived indices, and texture measures. Each boxplot showed land cover types (x-axis) against backscatter/reflectance/index values (y-axis) for each predictor variable.
Scripts: GEE scripts for extracting image statistics and exporting as csv files, which are part of the overall image classification scripts; R scripts for generating Set A box-whisker plots and Set B box-whisker plots.
To aid in the delineation of ROIs to classify the 1995 image stack, binary mask layers were needed for two land cover types: oil palm and rubber. The mask layers for these two land cover types were generated—first through a visual evaluation of the boxplots to select predictor variables in which the two land cover types could be discriminated, and second by implementing a decision tree algorithm using the selected predictors and the 2015 ROI polygons to determine the threshold values for each predictor. The specific predictors (channels/bands) selected through visual assessment of boxplots from the 2015 image statistics of Landsat-8/PALSAR-2 were as follows (Fig.S1, Fig.S2, Fig.S3, Fig.S4 in the paper):
Predictor | Land Cover Type |
---|---|
HH | vegetation; non-vegetation |
B4 | shrubs/orchards; mangroves/rubber/oil palm; forest |
B5 | oil palm/rubber; mangroves |
B6 | rubber; oil palm |
The following scripts were used for the decision tree and mask generation process. First, an R script was developed to generate the decision tree using the tree
package (Ripley 2015) in R software. The R script used the csv file containing the extracted 2015 image values of predictor variable layers for all ROI polygons as input data to produce tree dendrogram images and summary text files as outputs. Second, based on the tree dendrograms and summary text files, a simplified decision tree flowchart (Fig.S5 in the paper) was designed again using the yEd Graph Editor for easily depicting the decision tree with the predictor variable and corresponding thresholds for creating binary mask layers for the selected land cover types. Third, the binary mask layers were generated through a Google Earth Engine script using the information from the decision tree flowchart. Once the mask layers were produced, ROIs for the selected land cover types were delineated for classifying the 1995 image stack.
Scripts: R script for implementing decision tree classifier; yEd graphml for designing decision tree flowchart; GEE script for generating binary mask layers.
Image classification was implemented for both 1995 and 2015 individual sensor data and combined sensor data using Google Earth Engine. Three scripts were prepared for each image data group to implement the Random Forests machine learning classifier: Set A 1995, Set A 2015, and Set B 2015; and in each script, classification is done for Landsat-only, SAR-only, and combined Landsat+SAR data. The scripts also implemented calculation of texture measures for SAR data, image stacking, accuracy assessment, mode filtering of land cover maps, and exporting of both the output land cover rasters in TIF file format (Fig.3 in the paper) and computed image statistics for all ROI polygons over combined sensor data. In addition, an R script using the randomForest
package (Liaw & Wiener 2014) in R software, particularly the mean decrease in accuracy (permutation importance), was executed to extract the important predictor variables contributing to improved accuracies of both individual and combined sensor data, in turn producing variable importance plots and summary text files as outputs.
Scripts: GEE scripts for executing image classification on Set A 1995, Set A 2015, and Set B 2015 image data groups; and a separate script for masking classified land cover maps to exclude pixels outside of the study area in preparation for change analysis; R script for extracting variable importance from the Random Forests classifier.
The original accuracy assessments, including error matrices, overall accuracies, user's and producer's accuracies, and f-scores, were implemented in Google Earth Engine, calculated based on pixel counts. From these results, the unbiased accuracy estimates and confidence intervals were calculated, following the good practice recommendations for assessing classification accuracies (Olofsson et al. 2014). The unbiased accuracy estimates and confidence intervals were calculated using Microsoft Excel. In addition, McNemar's tests were implemented to evaluate the statistical significance of the difference between the overall accuracies of classification results using the exact2x2
package (Fay & Hunsberger 2017) in R software.
Scripts: R script for implementing McNemar's tests.
The mode-filtered land cover maps were used for constructing a cross-tabulation matrix to summarise the area and percentage of land cover change from 1995 to 2015 using the Land Cover Change
function of the Semi-Automatic Classification Plugin
in QGIS software (QGIS Development Team 2017), which implements map comparison to calculate the difference between a reference and a new classification raster map. In addition, the change values from the cross-tabulation matrix were used to form a Sankey diagram using an online generator to visualise the land cover transitions (Fig.4 in the paper; Cuba 2015).
Scripts: JSON script for generating the Sankey diagram.
Extraction of Image Statistics
- csv files containing extracted image statistics for Set A 1995, Set A 2015, and Set B 2015 using ROI polygons over individual and combined sensor data
- boxplots showing distribution of backscatter/reflectance values per land cover type for each predictor variable for Set A 1995 and 2015 and Set B 2015
Decision Tree and Mask Generation
- tree dendrograms and summary text files
- decision tree flowchart
Image Classification
- output land cover rasters in TIF file format, including original, mode-filtered, and masked raster files
- csv files containing extracted image statistics (see Extraction of Image Statistics above)
- variable importance plots and summary text files
Accuracy Assessment
- summary text file containing results of McNemar's test
- Excel spreadsheet with unbiased accuracy estimates and confidence intervals
Change Analysis
- csv file output of land cover change analysis
- output land cover change raster in TIF file format
Extras
- QML files for displaying land cover and land cover change raster maps in QGIS
De Alban, J.D.T., G.M. Connette, P. Oswald, E.L. Webb (2018). Combined Landsat and L-band SAR data improves land cover classification and change detection in dynamic tropical landscapes. Remote Sens. 10(2), 306. doi:10.3390/rs10020306
BibTeX entry:
@article{de_alban_combined_2018,
title = {Combined {Landsat} and {L}-{Band} {SAR} data improves land cover classification and change detection in dynamic tropical landscapes},
volume = {10},
copyright = {http://creativecommons.org/licenses/by/4.0/},
url = {http://www.mdpi.com/2072-4292/10/2/306},
doi = {10.3390/rs10020306},
language = {en},
number = {2},
urldate = {2018-02-16},
journal = {Remote Sensing},
author = {De Alban, Jose Don T. and Connette, Grant M. and Oswald, Patrick and Webb, Edward L.},
month = feb,
year = {2018},
keywords = {Landsat, tropical forest, rubber, JERS-1 SAR, Google Earth Engine, oil palm, land change, agricultural plantation, ALOS-2 PALSAR-2},
pages = {306}
}
Fay, M.P. & Hunsberger, S.A. (2017) exact2x2: Exact Tests and Confidence Intervals for 2x2 Tables.
Cuba, N. (2015) Research note: Sankey diagrams for visualizing land cover dynamics. Landscape and Urban Planning, 139, 163–167. doi:10.1016/j.landurbplan.2015.03.010
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D. & Moore, R. (2017) Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. doi:10.1016/j.rse.2017.06.031
Liaw, A. & Wiener, M. (2014) randomForest: Breiman and Cutler’s random forests for classification and regression.
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E. & Wulder, M.A. (2014) Good practices for estimating area and assessing accuracy of land change. Remote Sensing of Environment, 148, 42–57. doi:10.1016/j.rse.2014.02.015
QGIS Development Team (2017) QGIS Geographic Information System. Open Source Geospatial Foundation Project.
R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Ripley, B. (2015) tree: Classification and Regression Trees.
Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis, 1st ed. 2009. Corr. 3rd printing 2010 edition. Springer, Dordrecht, Netherlands.
Creative Commons Attribution 4.0 International CC BY 4.0.