Each dataset (gdb
for GDB7-22-TS, cyclo
for Cyclo-23-TS, proparg
for Proparg-21-TS) directory contains:
xyz/
— the original (DFT) geometries.xyz-xtb/
— GFN2-xTB geometries.{dataset}.csv
— the CSV file that contains:idx
/rxn_id
/ (mol
,enan
): reaction indices used to find the corresponding xyz files.dE0
/G_act
/Eafw
: target property.rxn_smiles
: unmapped reaction SMILESrxn_smiles_mapped
: the original ("true") atom-mapped SMILESrxn_smiles_rxnmapper
: SMILES mapped by RXNMapperrxn_smiles_rxnmapper_full
: SMILES mapped by RXNMapper including hydrogensbad_xtb
: is the reaction is excluded from the geometry quality tests (xTB optimization failed)
Additionally,
cyclo/matches
: xyz atom-maps (see ../data-curation/cyclo-atom-mapping/).proparg/proparg-weird-smiles.csv
: "bad" SMILES for Proparg-21-TS automatically obtained from xyz taken from doi:10.1039/d3dd00175j (repo). They are also mapped by RXNMapper but were not used to produce the results of the paper.