A Julia package for easily downloading and accessing popular cheminformatics datasets.
using Pkg
Pkg.add("MoleculeDatasets")using MoleculeDatasets
# Download and load a dataset
data = get_mol_dataset("esol")See dataset_info.jl
To add a new dataset to the package, edit the MOL_DATASETS dictionary in src/dataset_info.jl. Each dataset entry should include:
For local datasets:
"dataset_key" => Dict(
"name" => "Dataset Display Name",
"description" => "Brief description of the dataset",
"filepath" => "data/filename.csv",
"format" => "csv",
"size" => "file size",
"type" => "local",
"reference" => "Full citation",
"doi" => "DOI if available",
"website" => "URL if available"
)For remote datasets:
"dataset_key" => Dict(
"name" => "Dataset Display Name",
"description" => "Brief description of the dataset",
"url" => "https://example.com/dataset.csv",
"format" => "csv",
"size" => "file size",
"type" => "remote",
"reference" => "Full citation",
"doi" => "DOI if available",
"website" => "URL if available"
)get_mol_dataset(name; output_dir="data", force_download=false, verbose=true): Download and load a dataset as a DataFrame