Skip to content

This repository provides code for gap-filling carbon flux data measured with eddy covariance using Marginal Distribution Sampling (MDS) and Extreme Gradient Boosting (XGB), respectively.

Notifications You must be signed in to change notification settings

YujieLiu666/fluxgapfill_XGB_vs_MDS

Repository files navigation

This repository provides code for gap-filling carbon flux data measured with eddy covariance using Marginal Distribution Sampling (MDS) and Extreme Gradient Boosting (XGB), respectively.

image

Environment

  • Folder Environment: python environment for XGBoost

Input data (input.csv):

Workflow

STEP 01 MDS_EProc_object.R

  • IQR fitering
  • u* filtering
  • save RDS object

STEP 02 train_XGB.ipynb The script utilizes BayesSearchCV from scikit-optimize, efficiently completing the gap-filling of a 13-year time series with XGBoost (XGB) for FCO2 in approximately 20 minutes. It employs 10-fold cross-validation for model evaluation. This script consists of the following steps:

  • 01: Finding the best hyperparameters for XGBoost.
  • 02: Training the model using the best hyperparameters determined in Step 1.
  • 03: Evaluating the model performance using 10-fold cross-validation. Several model performance metrics (RMSE, R2, and bias) are computed, and learning curves are plotted.
  • 04: Plotting Feature (variable) importance
  • 05: Computing annual sums of FCO2
  • 06: Computing monthly sums of FCO2

STEP 03 MDS_10_CV.Rmd

  • Gap filling using MDS, following the same cross-validation (10 fold) to ensure the best comparison between MDS and XGB.

STEP 04 ANN_create_synthetic_data.ipynb

  • The script was executed on Google Colab with the purpose of generating synthetic data using an Artificial Neural Network (ANN). The script is adapted from Vekuri et al. 2023, https://doi.org/10.1038/s41598-023-28827-2, by adding GCC as input feature.

STEP 05 generate_data_for_scenarios.Rmd

  • To generate data for different experimental scenario 1,2 and 3, as described in the manuscript.
  • Experimental scenario 1: different gap lengths
  • Experimental scenario 2: different gap timing or locations
  • Experimental scenario 3: different subset (by year) of input data

STEP 06 repeat STEP 02 and 03 for different scenarios

About

This repository provides code for gap-filling carbon flux data measured with eddy covariance using Marginal Distribution Sampling (MDS) and Extreme Gradient Boosting (XGB), respectively.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published