Demystifying normalized difference vegetation index (NDVI) for greenness exposure assessments and policy interventions in urban greening
Project made in collaboration with S.M. Labib from the Department of Human Geography and Spatial Planning at Utrecht University.
Most nature and health research use the normalized difference vegetation index (NDVI) for measuring greenness exposure. However, little is known about what NDVI measures in terms of vegetation types (e.g., grass, canopy coverage) within certain analysis zones (e.g., 500m buffer). Additionally, more exploration is needed to understand how to interpret changes in NDVI (e.g., per 0.1 increments) for policy intervention in urban greening. This study aims to address such gaps in the literature.
- Programming Language: used Python v3.9.10
- Environment: used Conda v4.11.0
- Computing Platform: used Jupyter Notebook v6.4.8
- Geographic Information System (GIS): used QGIS v3.16.16-Hannover
- Machine: used macOS Big Sur v11.6.5 (OS) and 2,3 GHz Dual-Core Intel Core i5 (processor)
- Running time: approximately 25 minutes
First, run 1_data_processing_ndvi.ipynb to perform the required steps to process greenness metrics for different buffer zones.
The methodology starts by processing remotely sensed satellite images for five greenness metrics. From left to right, NDVI, overall green spaces, tree canopy, forbs and shrubs, and grass presence:
Then, we assessed greenness exposure by averaging NDVI, greenspace, and vegetation presence at 100, 300, and 500 m buffer distances applying focal statistics. The filters (i.e. buffer zones) convolved over the input images had a rectangle shape and sized 21, 61, and 101 pixels (for 10 x 10 m resolution image), respectively.
Here, we created greenness exposure maps (i.e. output layers) for different buffer zones (i.e. spatial scale). From top to down, NDVI and forbs and shrubs density. And from left to right, input image, and greenness exposure for 100, 300, and 500 meters distance:
Then, we performed data reduction by randomly sampling locations within the Greater Manchester boudaries for 100, 300, and 500 meters buffer distance ensuring the data representation of the study area.
Finally, we sampled greenness exposure maps at the randomly generated locations to extract their raster or cell values into three data frames (one per buffer zone). For instance, greenness metrics at 100 meters data frame should be as follows:
Next, we conducted an exploratory data analysis using the script 2_exploratory_data_analysis_ndvi.ipynb, where we mainly investigated:
- Statistics describing different greenness metrics
- Data distributions for greenness metrics
- Linear regression models
Following the methodology, run 3_exploratory_gam_analysis_ndvi.ipynb to get a better understanding of GAMs:
1.1. Introduction to Generalized Additive Models (GAMs)
1.2. Components and parameters of GAMs (distribution, link function, functional form, lambda, and splines)
1.3. How to select the best model and tune the model (GCV, Effective DoF, AIC, Pseudo R-Squared, and grid search)?
Execute the script 4_multivariate_analysis_ndvi.ipynb to explore the sensitivity of NDVI to vegetation types and amounts of vegetation for different buffer zones in a multivariate model. The multivariate model explaining NDVI for a buffer distance of 100 meters should be as follows:
Finally, run 5_univariate_analysis_ndvi.ipynb to explore the sensitivity of vegetation types and amounts of vegetation to increments in mean NDVI for different bufffer zones in a univariate model. Univariate models for a buffer distance of 100 meters should be as follows:
Our results suggest that NDVI is sensitive to vegetation types and that types and quantities of vegetation are sensitive to increments in mean NDVI (i.e. 0.1 increments) both for different buffer zones. Furthermore, these sensitivities usually follow nonlinear patterns. Overall, the results address the mystery behind NDVI for greenness exposure assessment and might be translated into actionable policy interventions in urban greening.
We submitted this project to the Department of Information and Computing Sciences in collaboration with the Department of Human Geography and Spatial Planning in candidacy for the Master of Science degree in Applied Data Science, Utrecht University.