Name Formatting: type_size_name_num_of_classes.csv
- type: R->Regression and C->Classification
- size: Number of instances in the dataset
- name: Name of dataset
- num_of_classes: Number of classes (Classification only)
- BBBP dataset [1] (Blood-brain barrier penetration) -> C_2039_BBBP_2.csv
- SAMPL dataset [2] (Hydration free energy) -> R_642_SAMPL.csv
- AQSOLDB dataset [3] (Aqueous Solubility) -> R_9982_AQSOLDB.csv
Note: Datasets 1-2 are edited versions of the MoleculeNet repository [12].
[6] Martins, Ines Filipa, et al. "A Bayesian approach to in silico blood-brain barrier penetration modeling." Journal of chemical information and modeling 52.6 (2012): 1686-1697.
[8] Mobley, David L., and J. Peter Guthrie. "FreeSolv: a database of experimental and calculated hydration free energies, with input files." Journal of computer-aided molecular design 28.7 (2014): 711-720.
[11] Sorkun, M. C., Khetan, A., & Er, S. (2019). AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Scientific data, 6(1), 1-8.
[12] Wu, Zhenqin, et al. "MoleculeNet: a benchmark for molecular machine learning." Chemical science 9.2 (2018): 513-530.