Skip to content

Sample datasets from the central repository wrapped into modules for convenience.

Notifications You must be signed in to change notification settings

nfdi4cat/data4cat

Repository files navigation

jupyter
jupytext kernelspec
formats text_representation
ipynb,md
extension format_name format_version jupytext_version
.md
markdown
1.3
1.16.4
display_name language name
Python 3 (ipykernel)
python
python3

Usage of the data4cat module

For convenience and e.g. the usage in lectures datasets from the central NFDI4Cat repository (Dataverse) where wrapped into modules. The convenience functions should enable a smooth start on how to work with published remote data. Datasets included up to now are:

  • The BasCat DinoRun dataset on synthesis to ethanol

Installation of the data4cat module

For the installation you can clone or download the repository:

git clone https://github.com/nfdi4cat/data4cat.git

cd into the directory an install data4cat:

pip install .

Or you can directly install the module from the remote source:

python -m pip install git+https://github.com/nfdi4cat/data4cat.git@main

To uninstall simply do a:

pip uninstall data4cat

With the package installed you first need to import the module:

from data4cat import dino_run

And create an instance:

dinodat = dino_run.dino_offline()

The two steps above have to be done always.

The dino_run dataset from the NFDI4Cat Dataverse instance

One dataset is the BasCat performance dataset on the syngas to ethanol reaction.

Download the dino_run dataset

In case that there is no offline version of the dataset available (e.g. after a fresh install) a copy of the dataset can be downloaded like this:

dinodat.one_shot_dumb()

Create a dataset from the offline data

You can get the data either in the form of a pandas dataframe or as a Bunch object in the style of scikit-learn datasets. You can get the original data in the following way:

original = dinodat.original_data()
original.head()

Create a subset of the offline data for the startup phase

There is a sub dataset for the startup phase with a TOS < 85 available. Again both as pandas dataframe and Bunch object.

startup = dinodat.startup_data()
startup.head()

Create a subset of the offline data for the selectivity

Especially for unsupervised learning tasks there is a subset of the data prepared that contains only the selectivity data. When asking for this subset also reactors are provided, here they are put in a clusters object.

selectivity, clusters = dinodat.selectivity()
selectivity.head()
clusters.head()

Create a subset of the offline data for the selectivity without reactor 5

In case needed when you provide the r5 argument to False it will exclude the empty reactor 5.

selectivity_wo5, clusters = dinodat.selectivity(r5=False)
selectivity_wo5.head()
clusters.head()

Create a subset of the offline data for the reaction conditions

For supervised tasks a subset of the data is provided that contains the reaction conditions as features and the selectivity to ethanol as target.

react_cond, selectivity_EtOH = dinodat.react_cond()
react_cond.head()
selectivity_EtOH.head()

Create a subset of the offline data for the reaction conditions without reactor 5

Like before the empty reactor 5 can be excluded with the r5 argument set to False.

react_cond, selectivity_EtOH = dinodat.react_cond(r5=False)
react_cond.tail()
selectivity_EtOH.tail()

About

Sample datasets from the central repository wrapped into modules for convenience.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published