Environment

This repository provides code for gap-filling carbon flux data measured with eddy covariance using Marginal Distribution Sampling (MDS) and Extreme Gradient Boosting (XGB), respectively.

Environment

Folder Environment: python environment for XGBoost

Input data (input.csv):

Flux data for Bartlett Research Forest, siteID: US-Bar, https://ameriflux.lbl.gov/sites/siteinfo/US-Bar

Workflow

STEP 01 MDS_EProc_object.R

IQR fitering
u* filtering
save RDS object

STEP 02 train_XGB.ipynb The script utilizes BayesSearchCV from scikit-optimize, efficiently completing the gap-filling of a 13-year time series with XGBoost (XGB) for FCO2 in approximately 20 minutes. It employs 10-fold cross-validation for model evaluation. This script consists of the following steps:

01: Finding the best hyperparameters for XGBoost.
02: Training the model using the best hyperparameters determined in Step 1.
03: Evaluating the model performance using 10-fold cross-validation. Several model performance metrics (RMSE, R2, and bias) are computed, and learning curves are plotted.
04: Plotting Feature (variable) importance
05: Computing annual sums of FCO2
06: Computing monthly sums of FCO2

STEP 03 MDS_10_CV.Rmd

Gap filling using MDS, following the same cross-validation (10 fold) to ensure the best comparison between MDS and XGB.

STEP 04 ANN_create_synthetic_data.ipynb

The script was executed on Google Colab with the purpose of generating synthetic data using an Artificial Neural Network (ANN). The script is adapted from Vekuri et al. 2023, https://doi.org/10.1038/s41598-023-28827-2, by adding GCC as input feature.

STEP 05 generate_data_for_scenarios.Rmd

To generate data for different experimental scenario 1,2 and 3, as described in the manuscript.
Experimental scenario 1: different gap lengths
Experimental scenario 2: different gap timing or locations
Experimental scenario 3: different subset (by year) of input data

STEP 06 repeat STEP 02 and 03 for different scenarios

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environment

Input data (input.csv):

Workflow

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Environment		Environment
source		source
summary_results		summary_results
ANN_create_synthetic_data.ipynb		ANN_create_synthetic_data.ipynb
MDS_10_CV.Rmd		MDS_10_CV.Rmd
MDS_EProc_object.R		MDS_EProc_object.R
README.md		README.md
generate_data_for_scenarios.Rmd		generate_data_for_scenarios.Rmd
train_XGB.ipynb		train_XGB.ipynb

YujieLiu666/fluxgapfill_XGB_vs_MDS

Folders and files

Latest commit

History

Repository files navigation

Environment

Input data (input.csv):

Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages