-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Dataset Todo
- Add "synthetic" data New Task: Add "synthetic" data #13 [nice-to-have]
- Run ChemDataExtractor on Free Text New Task: Run ChemDataExtractor on Free Text #18 [needs-discussion]
- Prepare PubChem dataset New Task: Prepare PubChem dataset #19 [priority-high]
- Add CheMBL dataset New Task: Add CheMBL dataset #24 [priority-high]
- Add ESOL dataset New task: Add ESOL dataset #33 [priority-high]
Dataset In Progress
-
Add Papyrus dataset Add Papyrus dataset #335 Add Papyrus 3 Million data point pchembl for 7k protein #340
-
Add papyrus protein targets Add papyrus protein targets #336
-
Adding data from the Human Metabolome Database (HMDB) New Task: Adding data from the Human Metabolome Database (HMDB) #136 [adamoyoung]
-
Adding Data from MassBank of North America (MoNA) New Task: Adding Data from MassBank of North America (MoNA) #137 [adamoyoung]
-
Add Open Targets datasets for drug information Add Open Targets datasets for drug information #138 FDA Adverse reactions datasets - We can add a dataset containing the frequency of adverse reaction events for individual drugs identified by their CHEMBL ID. #139 Drug disease indications - Add a dataset of drug disease indications by CHEMBL ID. #140 Mechanisms of action - Add a dataset of mechanism for multiple CHEMBL ID's. #141 Drug descriptions & approval - Add a dataset containing metadata identifiers (SMILES, size, names, years, etc) and approval results in different countries. #142 [jackapbutler]
-
Adding the europepmc dataset Adding the europepmc dataset #162 [hssn-20]
-
Adding Uniprot, X-linking to reaction DBs for enzymes Adding Uniprot, X-linking to reaction DBs for enzymes #191 [hypnopump]
-
Add DrugChat data Add DrugChat data #293 [alxfgh]
-
Adding Suzuki Miyaura yield prediction dataset Adding Suzuki Miyaura yield prediction dataset #212 [pschwllr]
-
Add QMOF dataset Add QMOF dataset #235 [kjappelbaum]
-
Add SuperCon dataset Add SuperCon dataset #236 [kjappelbaum]
-
Add QMUG dataset Add QMUG dataset #237 [kjappelbaum]
-
Add Enamine dataset Add Enamine dataset #238 [kjappelbaum]
-
Add ORD dataset Add ORD dataset #239 [kjappelbaum]
-
Refactor rhea_db into csv files refactor rhea_db into
csv
files #242 [kjappelbaum] -
Add Drug-Target Interaction data New Task: Add Drug-Target Interaction data #68 [strubeyj]
-
H2_storage_materials_database H2_storage_materials_database #64 [bethanyconnolly] created h2 storage dataset #76
-
Add EuroPMC Dataset New Task: Add EuroPMC Dataset #32 [abhinav-kashyap-asus]
-
Add Buchwald Hartwig dataset[pschwllr] Add Buchwald Hartwig dataset #81
-
Add Drug-Drug Interaction Data from nSIDES [apoorvasrinivasan26] New Task: Add Drug-Drug Interaction Data #89
-
Add uspto data from drfp Add uspto data from drfp #95
-
Add NLMChem Add NLMChem #114 [apoorvasrinivasan26]
-
Add ThermoML Archive dataset Add ThermoML Archive dataset #118
-
Adding the Chemistry textbooks from LibreTexts library New Task: Adding the Chemistry textbooks from LibreTexts library #134
-
Add Therapeutic Data Commons dataset New Task: Add Therapeutic Data Commons dataset #27 [priority-high]
-[ ] Single-instance [phalem] New Task | Finish Single-instance remaining data & Generation Datasets from TDC #90
- Add ADME Property [phalem] New Task : Add ADME Property data from TDC #84
- Absorption Adding ADME absorption TDC #85
- Caco-2 (Cell Effective Permeability), Wang et al.[MicPie] New task: caco2_wang via tdcommons.ai #37
- PAMPA Permeability, NCATS [MicPie] New task: Add PAMPA Permeability, NCATS dataset #41
- HIA (Human Intestinal Absorption), Hou et al. Adding ADME absorption TDC #85
- Pgp (P-glycoprotein) Inhibition, Broccatelli et al. Adding ADME absorption TDC #85
- Bioavailability, Ma et al. Adding ADME absorption TDC #85
- Lipophilicity, AstraZeneca [MicPie] New task: Add lipophilicity dataset #22
- Solubility, AqSolDB Adding ADME absorption TDC #85
- Hydration Free Energy, FreeSolv Adding ADME absorption TDC #85
- Distribution Add ADME Distribution data from TDC #86
- BBB (Blood-Brain Barrier), Martins et al. Add ADME Distribution data from TDC #86
- PPBR (Plasma Protein Binding Rate), AstraZeneca Add ADME Distribution data from TDC #86
- VDss (Volumn of Distribution at steady state), Lombardo et al. Add ADME Distribution data from TDC #86
- Metabolism Add ADME metabolism from TDC #88
- CYP P450 2C19 Inhibition, Veith et al. Add ADME metabolism from TDC #88
- CYP P450 2D6 Inhibition, Veith et al. Add ADME metabolism from TDC #88
- CYP P450 3A4 Inhibition, Veith et al. Add ADME metabolism from TDC #88
- CYP P450 1A2 Inhibition, Veith et al. Add ADME metabolism from TDC #88
- CYP P450 2C9 Inhibition, Veith et al. Add ADME metabolism from TDC #88
- CYP2C9 Substrate, Carbon-Mangels et al. Add ADME metabolism from TDC #88
- CYP2D6 Substrate, Carbon-Mangels et al. Add ADME metabolism from TDC #88
- CYP3A4 Substrate, Carbon-Mangels et al. Add ADME metabolism from TDC #88
- Excretion Add ADME Excretion from tdc #87
- Half Life, Obach et al. Add ADME Excretion from tdc #87
- Clearance, AstraZeneca Add ADME Excretion from tdc #87
- Absorption Adding ADME absorption TDC #85
- Add Toxicity [phalem]
- Acute Toxicity LD50 feat: add acute toxicity ld50 from TDC #54
- hERG blockers Add h erg blocker from TDC #53
- hERG Central Add hERG Central with multiple target #61
- hERG Karim et al. Add h erg karim et al from TDC #52
- Ames Mutagenicity Add Ames Mutagenicity from TDC #56
- DILI (Drug Induced Liver Injury) Add DILI from TDC #51
- Skin Reaction Add Skin Reaction from TDC #49
- Carcinogens Add Carcinogens from TDC #55
- Tox21 Add Tox21 consist of 12 assay with URI include #77
- ToxCast Tabular data issues | ToxCast consist of 615 columns toward 615 dataset #79 Add Toxcast dataset for 8k compound agaist 615 target #343 Add ToxCast 1.5m datapoint for 8K mol on 617 assay #345 Add Toxcast with an assay description 90k datapoint for 158 assay #346
- ClinTox feat: add ClinTox from TDC #50
- Add High-throughput Screening [phalem]
- SARS-CoV-2 In Vitro, Touret et al. Add HTS SARSCoV2 Vitro Touret et.al data #59
- SARS-CoV-2 3CL Protease, Diamond. Add HTS SARSCoV2 3CLPro Diamond from TDC #94
- HIV Add HTS HIV data #60
- Butkiewicz et al. feat: add HTS Butkiewicz et al of 9 Target #62
- Add Quantum Mechanics Modeling Tabular data issues | Complex data structure identifier #78
- QM7b
- QM8
- QM9
- Add Reaction Yields Tabular data issues | Complex data structure identifier #78
- Buchwald-Hartwig Add Buchwald Hartwig dataset #81
- USPTO
- Add Epitope(Immunotherapy under Target discovery) Issue | Need to know Epitope data given active indices start 0 or 1 #97
- IEDB, Jespersen et al. Add Epitope data from TDC #96
- PDB, Jespersen et al. Add Epitope data from TDC #96
- Add Antibody Developability Tabular data issues | Complex data structure identifier #78
- TAP add antibody developability from TDC #99
- SAbDab, Chen et al. add antibody developability from TDC #99
- Add CRISPR Repair Outcome[apoorvasrinivasan26]
- Leenay et al.
-[ ] Multi-instance
- Add Drug-Target Interaction data New Task: Add Drug-Target Interaction data #68[strubeyj]
- BindingDB
- DAVIS
- KIBA
- Add Drug-Drug Interaction
- DrugBank Multi-Typed DDI
- TWOSIDES Polypharmacy Side Effects
- Add Gene-Disease Association
- DisGeNET
- Add Drug Response
- GDSC1
- GDSC2
- Add Peptide-MHC Binding
- MHC Class I, IEDB-IMGT, Nielsen et al.
- MHC Class II, IEDB, Jensen et al.
- Add Antibody-antigen Affinity
- SAbDab
- Add MicroRNA-Target Interaction
- miRTarBase
- Add Catalyst
- USPTO
- Add TCR-Epitope Binding Affinity [strubeyj] Add tcr epitope binding dataset #67
- Weber et al.
-[ ] Generation data [phalem] New Task | Finish Single-instance remaining data & Generation Datasets from TDC #90
- Add Molecule Generation Added Molegen datasets #178 [arkadiusz-czerwinski]
- MOSES Added Molegen datasets #178 [arkadiusz-czerwinski]
- ZINC Added Molegen datasets #178 [arkadiusz-czerwinski]
- ChEMBL Added Molegen datasets #178 [arkadiusz-czerwinski]
- Add Retrosynthesis
- USPTO-50K
- USPTO
- Add Reaction Outcome
- USPTO
- Add Structure-based Drug Design
- PDBBind
- DUD-E
- scPDB
- Add ADME Property [phalem] New Task : Add ADME Property data from TDC #84
Done ✓
- Add flashpoint dataset feat: add flashpoint dataset #43 [othertea]
- add initial model pipeline [maw501] [bethanyconnolly][kjappelbaum][MicPie] feat: add initial model pipeline #71
- Add iupac goldbook Add iupac goldbook #187 add iupac goldbook #188 [MicPie]
- Add RXN-SMILES as identifier type Add RXN-SMILES as identifier type #113 [kjappelbaum]
- Add benchmark field feat: implement benchmark field #116
- Add entos protonation energy add entos protonation energy #244 feat: entos dataset #233 [kjappelbaum]
- Add chebi-20 dataset New Task: Add chebi-20 dataset #63 feat: add chebi 20 to datasets #108 [jackapbutler]
- Add FDA Adverse reactions datasets FDA Adverse reactions datasets - We can add a dataset containing the frequency of adverse reaction events for individual drugs identified by their CHEMBL ID. #139 feat: add fda dataset #143 [jackapbutler]
- Add Natural text dataset elsevier_oa_cc-by_corpus Natural text dataset elsevier_oa_cc-by_corpus #216