Skip to content

Latest commit

 

History

History
117 lines (79 loc) · 14.3 KB

README.md

File metadata and controls

117 lines (79 loc) · 14.3 KB

Collection of notebooks for post-processing GHG emission results obtained from RE-Emission

Tomasz Janus

University of Manchester

Short description

This repository contains a collection of notebooks used to perform a study on "Planning low-carbon reservoir investments with spatially-explicit emission models" that was carried out using irrigation, multi-purpose and hydroelectric reservoirs in Myanmar. The manuscript shall be linked shortly. The work relied on data generated by three pieces of software:

Pywr was used to generate mean annual hydropower production and firm power estimates for some of the unreported existing and for the future dams. It is not called here directly. Instead we use already prepared outputs from water resources simulation in pywr. Geocaret was used to create reservoir and catchment delineations for each reservoir and derive reservoir and catchment properties required for subsequent estimation of gas emissions. It is also not called directly but instead we rely on its outputs. Reemission was used to produce GHG emission estimates for each reservoir based on the outputs of Geocaret. Reemission is called directly from some of the notebooks.

The study processes the inputs from geocaret, estimates GHG emissions using RE-Emission, visualises the results, performs data analysis, trains surrogate machine learning (ML) models to facilitate explainability of GHG emission predictions, prepares inputs to multiobjective optimization algorithm, runs the MOO algorithm and processes the results, and finally, creates figures and computes various statics for the manuscript. The short explanation of what each individual notebook does, is provided at the end of this document in section The list of notebooks in the order of execution.

NOTE:

Non-dominated sorting in Pygmo that is used to sort optimization results, does not work in Python 3.11. It was not tested in Python 3.12+. It is recommended to use Python 3.10 or lower.

Installation

The repository does not require installation but relies on a large number of packages. We recommend that you set up a virtual environment dedicated to this repository before attempting to install all the dependencies. There are several packages for creating virtual environments such as venv, virtualenv, pyenv, etc. Please refer to web resources and find out what works best for you, e.g. in https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/.

Majority of the dependencies are included in requirements.txt. Additionally requirements_frozen.txt specify exact versions of dependencies used in the last working installation.

You can install these dependencies by running:

pip install -r requirements.txt

or

pip install -r requirements_frozen.txt

from the root folder of this repository within your virtual environment.

Additionally, you need to install reemission which, as of this data, is not yet on The Python Package Index. Hence, you can install directly from the dedicated package GitHub repository, see below.

With HTTP connection to GitHub:

pip install git+https://github.com/tomjanus/reemission.git

With SSH connection:

pip install git+ssh://git@github.com:tomjanus/reemission.git

The repository generates quite a large quantity of final and intermediate outputs in text and binary files representing data and figures. They will be generated as you go along and run consequtive notebooks. Alternatively, if you'd like to view this data without running the notebooks you can download the data from the web. To do this, run:

python download_ext_data.py

in the root of the repository. WARNING: This will overwrite any of the files already existing in the following directories: bin, intermediate, outputs, and saved_models.

You can also clean the repository from generated files by running:

python clean_repo.py

Usage

The repository uses a mix of Python and R and is divided into several notebooks written in Jupyter notebooks in Python or R, which have extension .ipynb and R Markdown files with extension .Rmd. We used Visual Studio code to execute all files, but alternative solutions, such as using Jupyter Notebook or Jupyter Lab in the browser for notebooks with .ipynb extension and R Studio for .Rmd files should work as well.

Known issues

The repository relies on data that is either generated in consecutive notebooks. The precalculated data by us can also be downloaded from Google Drive using publicly shareable links and using a python API for Google Drive called gdown. In some instances gdown may complain that the link cannot be downloaded as the file is not shared with sufficient permissions. It's a bug that can be fixed by upgrading gdown with no-cache-dir flag as shown below:

pip install --upgrade --no-cache-dir gdown

The list of notebooks in the order of execution

  • Notebook_1_run_batch_simulation.ipynb - This Jupyter notebook in Python runs a series of simulations with RE-Emission - a tool for estimating GHG emissions from reservoirs. It creates a number of RE-Emission input files in inputs/reemission, performs calculations for each input file and saves the outputs to outputs/reemission. The output files are saved in .xlsx and .json formats. The input data for RE-Emission is stored in inputs/reemission/reemission_inputs.json. The inputs are obtained from the GEOspatial CAtchment and REservoir analysis Tool GEOCaret.

  • Notebook_2_process_hydropower.Rmd - This notebook written in RMarkDown compares hydropower (HP) production-related figures in the Myanmar's existing and future hydroelectric power plants provided in the IFC database of dams against the simulated values from the national water resources model in Pywr. The map of the water resources model, the delineated reservoirs and emission estimates can be found here. FYI - The model on the map may not be 100% accurate, especially with regards to reservoir parameters, as it has been recently undegrgoing changes independently of this work and these changes may not be reflected in the visualisations. However, the topology and most of the parameter values should nevertheless be correct. The comparison figures produced in the notebook are saved to figures/ifc_pywr_power_comparison. The merged data from the IFC database and the simulations is saved to intermediate/merged_table.xlsx.

  • Notebook_3_combine_outputs.ipynb - This Jupyter notebook written in R combines multiple files in outputs/reemission containing outputs from batch RE-Emission simulations into a single 'combined' tabular dataset and saves this combineds data in .csv and .xlsx formats. Both files, i.e. combined_outputs.csv and combined_outputs.xlsx, are saved to the folder outputs/reemission/combined/.

  • Notebook_4_ghg_emission_plots.ipynb - This Jupyter notebook in Python creates statistical plots with emission outputs. The plots summarize net (aerial) emissions with different values of categorical variables, such as landuse intensity (low/high), soil type (mineral/organic) and water intake depth (shallow/deep), and visualise the distributions of emissions with all reservoirs types - hydroelectric, irrigation and multipurpose. The output figures are saved to figures/ghg_visualisation.

  • Notebook_4b_comparison_of_emissions_against_gridded_data.ipynb - This notebook loads and processes reservoir emissions in Myanmar calculated explicitly with ReEmission and estimated from the simplified parameterizations obtained by Harrison et al. 2021 [1] and Soued et al. 2022 [2]. Since the parameterization of Harrison et al. is fitted to emission estimates from a single year (2020) while this study relies on average emissions over the entire life-span of the reservoir (e.g. 100 years), we do not proceed with this parameterization and instead adopt the emission factors of Soued et al (emission factors vs. climatic zones). We then compare the emission fluxes and total emissions overall and due to each emission pathway between the explicitly calculated outputs from ReEmission (G-res) and Soued et al. We visualise the distributions of errors between the two approaches and provide overall statistics of fit/correspondence between both datasets. Finally, we fit the coefficients of regressions adopted in the papers of Almeida et al. 2019 [3] and Carlino et al. 2024 [3] which are used to disentangle net anthropogenic emissions from total reservoir emissions. We show differences between the original (default) and the fitted coefficients.

  • Notebook_5_create_reservoir_tile_maps.Rmd - This notebook written in RMarkDown creates tile plots of all delineated reservoirs. The tile plots of reservoir contours are saved to figures/maps/.

  • Notebook_6_create_reservoir_maps.ipynb - This notebook in R creates maps showing emissions and emission intensities of the reservoirs in Myanmar. The plots are saved to figures/maps.

  • Notebook_7_run_catboost_lightgbm_xgboost_regressions.ipynb - This notebook in Python visualises re-emission input data and explores the data structure, checks feature scores for CO2 and CH4 regression using conventional statistical methods, then fits the CO2 and CH4 estimates from re-emission using boosted tree regression models using pre-selected re-emission input data. Three boosted tree models are used: XGBoost, CATBoost and LightGBM. The regression model are then used to provide explanation about model predictions on a model-level as well as on the instance-level. The notebook saves figures to figures/data_exploration and figures/model_explanation. The fitted models are saved to outputs/model_explanations.

  • Notebook_8_dim_red_and_clustering_of_feature_importances.ipynb - This notebook in Python clusters reservoir based on their similarities in the feature importance space. The feature importance space is first reduced using various dimensionality reduction methods. The notebook implements different clustering as well as dimensionality reduction algorithms. The figures are saved to figures/clustering. Cluster data used for mapping with Notebook_9_create_clustering_maps.ipynb in R are saved to intermediate/density_mapping.

  • Notebook_9_create_clustering_maps.ipynb - This notebook in R plots three types of maps and saves them to figures/maps. The first map is emission intensity with a digital elevation map in the background. The two other types of maps are cluster maps based on cluster on features and feature ranks and coloured using voronoi polygons.

  • Notebook_9b_process_additional_information_from_water_resource_models.ipynb - This Jupyter notebook in Python processes outputs from the water resources models containing reservoir levels, turbine flows and recorded spill flows and creates input data for designing a ML learning model and subsequent explanations using xAI.

  • Notebook_9c_generate_breakdowns_for_selected_reservoirs_for_figure3.ipynb - This Jupyter notebook in Python trains a boosted tree model predicting emission intensity of HP reservoirs and use DALEX to interpret the results. The breakdown plots for selected reservoirs are plotted and saved to figures in .svg format.

  • Notebook_10_generate_inputs_for_MOO.ipynb - This notebook written in Python creates dataframes required for preparing inputs to a multiobjective optimisation (MOO) algorithm. The dataframes are saved as a collection of .csv files.

  • Notebook_11_create_input_files_for_MOO.ipynb - This notebook written in Python prepares input .txt files for the MOO algorithm.

  • Notebook_12_run_MOO.ipynb - This notebook written in Python runs optimal dam portfolio selection study in Myanmar and processes optimization results by converting the results in .sol files to .json and .csv and performing non-dominated sorting.

  • Notebook_12b_Visualise_MOO_results - Visualise MOO results as Pareto fronts and interactive parallel axis plots embedded in html files.

  • Notebook_13_plot_maps_of_MOO_results.ipynb - This notebook written in Python visualises the results of the optimization study on a composite figure consisting of 6 maps showing the selected dams for 6 different optimization scenarios indicated on the Pareto front plots generated in Notebook_12. The figure additionally features a number of statistical plots showing distributions of dams with respect to HP generation and elevation across all solutions. The objectives for each solutions are plotted in radar plots.

  • calculate_statistical_figures_for_the_manuscript.ipynb - This short notebook is used for calculating statistical numbers for reporting in the manuscript either directly in text or in tables.

References:

  • [1] Harrison, J. A., Prairie, Y. T., Mercier-Blais, S., & Soued, C. (2021). Year-2020 global distribution and pathways of reservoir methane and carbon dioxide emissions according to the greenhouse gas from reservoirs (G-res) model. Global Biogeochemical Cycles, 35, e2020GB006888. https://doi.org/10.1029/2020GB006888
  • [2] Soued, C., Harrison, J.A., Mercier-Blais, S. et al. Reservoir CO2 and CH4 emissions and their climate impact over the period 1900–2060. Nat. Geosci. 15, 700–705 (2022). https://doi.org/10.1038/s41561-022-01004-2
  • [3] Almeida, R.M., Shi, Q., Gomes-Selman, J.M. et al. Reducing greenhouse gas emissions of Amazon hydropower with strategic dam planning. Nat Commun 10, 4281 (2019). https://doi.org/10.1038/s41467-019-12179-5
  • [4] Carlino, A., Schmitt, R., Clark, A. et al. Rethinking energy planning to mitigate the impacts of African hydropower. Nat Sustain 7, 879–890 (2024). https://doi.org/10.1038/s41893-024-01367-x