Adsorption isotherm and error map plots
folder: After Active learning is complete, these plots can be used to visualize the adsorption isotherm and error heat map for the final GP fits. All the data files are provided for visualization.
P-X
folder: Active learning code for adsorption prediction in P-X phase space for 3 gas mixtures.
- training.csv and complete.csv are the training data and ground truth data for the systems. The training.csv provided in these folders are the initial training set as defined in the paper. As the AL algorithm start, it will update the next set of training points in this file only.
- mean.csv is the datafile that will be generated when the AL algorithm starts. This datafile contains the GP-MRE for both the species, the maximum GP relative errors, the MREs,
$R^2$ 's, perceived accrucies (with$\beta$ value of 2% as well as 5% for P-X-T, while 1% and 2% for P-X, for comparision purposes). - GP_mixtures.py is the Active learning engine (with dual-GP model)
- Active-learning-sh is the model updation handler which works in linux environment. This was based on Notre Dame Center for computing resource which is a grid-engine system. One can modify the first few lines depending on the specific linux environment or to run it locally, the first line can be removed as well.
P-X-T
folder: Active learning code for adsorption prediction in P-X-T phase space for 3 gas mixtures.
Same process as P-X phase space.
Kernel_opt
folder: This includes the Kernel optimization python code with the datafiles for three gases for P-X and P-X-T cases having different kernel combination upto 500 iterations (39 for P-X-T and 12 for P-X).
raspa2_May_2018
folder: This is the submodule of the raspa version which was used to generate the ground truth data. This version is of May 8 2018. It was originally cloned from the University of Amsterdam github link (original developers are David Dubbledam and co.).
The goal was to predict the adsorption isotherm of a binary mixture adsorption in a Cu-BTC MOF (also known as HKUST-1) using an Active learning protocol.
We investigated three different gas mixtures (CO2-CH4, Xe-Kr, and H2S-CO2) at different pressure, temperature and mole fraction conditions (thus 2 different studies). The low pressure limit was same for all mixtures (
Ground-truth data was generated using grand-canonical Monte Carlo (GCMC) simulations and were performed in the open-source software RASPA. Forcefield for CO2, CH4, Xe, Kr, and H2S used were TraPPE and for Cu-BTC it was Universal forcefield (UFF). Also, component-fractional Monte Carlo (CFC-GCMC) method was used to sample the GCMC simulation. All the ground truth data can be found in the complete.csv in the P-X and P-X-T folders.
A general workflow outline is shown below:
Active learning workflow for predicting adsorption using gaussian process regression (GPR). The learning starts from pre-processing the prior data. Pressure and temperature are standardised, while the mole-fraction is linearly scaled to –1 and 1, i.e. X* = (X - 1/2)x25/12. Then it is passed through the dual-GPs, one for each species. Then prediction are done, and the associated uncertainties are extracted. The perceived accuracies (PAC) for both the species are tested for convergence. If any of the PAC criteria is not met, learning continues, and the point with the highest uncertainty is added to the prior data. The active learning continues until the convergence condition is satisfied. The PAC parameter is defined as follows:
Also, the
Mean Relative Error (MRE):
The
Data requirement (% of ground truth):
Note
R2 (The coefficient of Determination):
The performance of the Active learning workflow is below for both the phase spaces and three systems:
Results for the P-X phase space:
Mixture | Kernel | Data requirement (% of ground truth) | MRE(species 1) (%) | MRE(species 2) (%) | R2(species 1) | R2(species 2) |
---|---|---|---|---|---|---|
CO2-CH4 | RBF | 3.001 | 5.263 | 5.417 | 0.986 | 0.999 |
Xe-Kr | RQ | 2.601 | 6.526 | 6.394 | 0.985 | 0.998 |
H2S-CO2 | RQ | 2.561 | 7.149 | 7.154 | 0.982 | 0.995 |
Results for the P-X-T phase space:
Mixture | Kernel | Data requirement (% of ground truth) | MRE(species 1) (%) | MRE(species 2) (%) | R2(species 1) | R2(species 2) |
---|---|---|---|---|---|---|
CO2-CH4 | RBF+RBF+RBF | 6.611 | 5.461 | 9.256 | 0.988 | 0.990 |
Xe-Kr | RBF+RBF+RBF | 6.650 | 4.850 | 7.025 | 0.990 | 0.990 |
H2S-CO2 | RQ | 5.549 | 8.276 | 11.682 | 0.976 | 0.986 |
Update: This work has been published in the RSC Digital Discovery journal and is free to download:
https://pubs.rsc.org/en/content/articlehtml/2023/dd/d3dd00106g