NDVI is a quantitative measurement derived from the red and near-infrared bands of satellite images. It is useful as an indicator of vegetation density and drought conditions.
This demo data pipeline project extracts subsets of Landsat data within a specified date range across 4 regions surrounding Tucson, Arizona. For each subsetted image, cloudy areas are masked out, the NDVI is calculated, and the mean NDVI per image is saved to a BigQuery database. The dashboard below shows the mean NDVI over time from 2018 through early 2023 across the 4 study areas.
The live dashboard is available at this link.
Run the following:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
-
Set up a project in Google Cloud, create a service account for it, and generate/download keys in JSON format
-
Create an .env file with the following:
export TF_VAR_project=<your_project_name> export GOOGLE_APPLICATION_CREDENTIALS=<path/to/your service credentials json file>
-
Run
source .env
-
Authenticate with google cloud via the CLI
gcloud auth application-default login
-
Go to the terraform folder and run:
terraform apply
-
Login to Prefect Cloud (unless you want to run locally)
prefect cloud login
-
Run each of the python files in the blocks folder.
python prefect/blocks/make_gcp_blocks.py python prefect/blocks/make_github_block.py
-
Run the prefect pipelines, you can deploy them to the cloud and run from there:
prefect deployment build prefect/flows/aws_to_gcp_subset.py:landsat_parent_flow -n landsat_aws_to_gcp_subset -q default -sb github/eo-climate-github -a prefect deployment build prefect/flows/gcp_cloudmask_ndvi.py:ndvi_parent_flow -n landsat_gcp_cloudmask_ndvi -q default -sb github/eo-climate-github -a prefect agent start -q 'default'
-
or run locally:
python prefect/flows/aws_to_gcp_subset.py python prefect/flows/gcp_cloudmask_ndvi.py
The aws_to_gcp_subset.py
pipeline searches for Landsat satellite images covering the study area(s) within a specified time period and save the intersecting subset of each image to a data lake (Google Cloud bucket).
The gcp_cloudmask_ndvi.py
pipeline masks out cloud cover in each image, calculates the NDVI from the red and near-infrared bands, and saved the mean NDVI per image in a BigQuery data warehouse.
You can schedule these to run automcatically in Prefect Cloud if you wish. You can adjust the date parameters either via the Prefect Cloud UI or by adjusting the default parameters in the python files.
Follow these instructions to run the dbt scripts in the cloud, and as described in the data engineering zoomcamp videos. You will need to fork this repo.
The dbt scripts transform the NDVI data from the BigQuery data warehouse, creating a new table free of duplicates, partitioned by date, and clustered by study area. This table is used by the dashboard, which was created via Google's Looker Studio.