Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
filipsPL committed Sep 4, 2024
1 parent 15c03aa commit 47a85ab
Show file tree
Hide file tree
Showing 7 changed files with 269 additions and 2 deletions.
70 changes: 68 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,68 @@
# tox24challenge
Dataset used in Tox24 challenge
Tox24 Challenge Dataset
===========

This repository contains molecular structures and descriptors for the Tox24 challenge prepared by me (team name: **filipsPL**). The goal of the challenge was to predict the in vitro activity of compounds' activity against [Transthyretin (TTR)](https://en.wikipedia.org/wiki/Transthyretin) using chemical structure data.

- [Tox24 Challenge Dataset](#tox24-challenge-dataset)
- [Dataset](#dataset)
- [Descriptors](#descriptors)
- [Importance of features](#importance-of-features)
- [The Challenge Results](#the-challenge-results)
- [References](#references)


## Dataset

This repository includes:

- The [chemical structures](data/smiles_org+fixed.csv) provided by the organizers and curated by me using my RDKit pipeline.
- **Training set** - a diversified set of 1000 compounds, used for training models [data/train.csv.xz](data/train.csv.xz)
- **Validation set**: a diversified set of 100 compounds, used for final validation of models [data/validation.csv.xz](data/validation.csv.xz)
- **Test set**: 500 compounds used to make predictions. It contains a leaderboard set (200 compounds) and a blind set (300 compounds) [data/test.csv.xz](data/test.csv.xz)

## Descriptors

The csv files contain 2D descriptors of molecules, including:

- DRKitDescriptors (2D)
- molecular fingerprints:
- CDK:
- CDKECFP4
- CDKEState
- CDKFCFP4
- CDKmolprop
- CDKpubchem
- CDKstandard
- Indigo fingerprints:
- IndigoResonanceSubstructure
- IndigoSimilarity
- RDKit fingerprints:
- RDkitFP-AtomPair
- RDkitFP-Avalon
- RDkitFP-FeatMorgan4
- RDkitFP-Layered
- RDkitFP-MACCS
- RDkitFP-Morgan2
- RDkitFP-Morgan3
- RDkitFP-Morgan4
- RDkitFP-Pattern
- RDkitFP-RDKit
- RDkitFP-Torsion

## Importance of features

Feature importances according to the final catboost model

![bar plot](feature_importance.png)


## The Challenge Results

Bar plot showing RMSE of submitted predictions (by me, based on the official results). Congratulations to the winning team Amidoff 🎉!

![rank](ranking.png)

## References

1. [OCHEM Platform for Tox24](https://ochem.eu/static/challenge.do)
2. [Chem. Res. Toxicol. 2024, 37, 6, 825–826](https://pubs.acs.org/doi/10.1021/acs.chemrestox.4c00192)
201 changes: 201 additions & 0 deletions data/smiles_org+fixed.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
SMILES_org,SMILES_fixed
CC1=CC[C@H]2C[C@@H]1C2(C)C,CC1=CC[C@H]2C[C@@H]1C2(C)C
CC[C@H](C)[C@H](N1SC2=CC=CC=C2C1=O)C(O)=O,CC[C@H](C)[C@@H](C(=O)O)n1sc2ccccc2c1=O
CCCCCC[C@@H](O)C/C=C\CCCCCCCC(O)=O,CCCCCC[C@@H](O)C/C=C\CCCCCCCC(=O)O
Cl/C=C/Cl,Cl/C=C/Cl
NC1=C(Cl)C=C(C=C1Cl)[N+]([O-])=O,Nc1c(Cl)cc([N+](=O)[O-])cc1Cl
FC1=CC=C(Br)C=C1,Fc1ccc(Br)cc1
CCCCCCCCCCCCCCCO,CCCCCCCCCCCCCCCO
OCCN1CCNCC1,OCCN1CCNCC1
CN(C)N,CN(C)N
[Cl-].C[N+]1(C)CCCCC1,C[N+]1(C)CCCCC1
ClCC(Cl)CCl,ClCC(Cl)CCl
NCC1=CC(CN)=CC=C1,NCc1cccc(CN)c1
OC(CCl)CCl,OC(CCl)CCl
CC(C)C1=CC(=CC=C1)C(C)C,CC(C)c1cccc(C(C)C)c1
OCNC(=O)NCO,O=C(NCO)NCO
CCCCC(CC)COCCO,CCCCC(CC)COCCO
O=CC(CC1=CC=C(C(C)(C)C)C=C1)C,CC(C=O)Cc1ccc(C(C)(C)C)cc1
OCC(CO)(CO)[N+]([O-])=O,O=[N+]([O-])C(CO)(CO)CO
S1C=2C(N=C1SC(SC#N)([H])[H])=C(C(=C(C2[H])[H])[H])[H],N#CSCSc1nc2ccccc2s1
CC(C)(CS(O)(=O)=O)NC(=O)C=C,C=CC(=O)NC(C)(C)CS(=O)(=O)O
COC1=CC=CC=C1N,COc1ccccc1N
ClCC(=O)NC1=CC=CC=C1,O=C(CCl)Nc1ccccc1
CCOCCO,CCOCCO
CCCCC(CC)C=O,CCCCC(C=O)CC
CCCCC(CC)COC(=O)C1=CC=C(C=C1)N(C)C,CCCCC(CC)COC(=O)c1ccc(N(C)C)cc1
OC(=O)CF,O=C(O)CF
CCCCCCCCC(CO)CCCCCC,CCCCCCCCC(CO)CCCCCC
CC(C)(O)C#N,CC(C)(O)C#N
CC(C)C1=C(O)C=CC=C1,CC(C)c1ccccc1O
COC1=CC(=CC=C1N)[N+]([O-])=O,COc1cc([N+](=O)[O-])ccc1N
O=C1OC(=O)C2C3CC(C=C3)C12,O=C1OC(=O)C2C3C=CC(C3)C12
CC(C)(C)C1=C(O)C=CC=C1,CC(C)(C)c1ccccc1O
CC(C)(CO)CO,CC(C)(CO)CO
O1C=CC2=C1C=CC=C2,c1ccc2occc2c1
CCCCOC(=O)COC1=C(Cl)C=C(Cl)C=C1,CCCCOC(=O)COc1ccc(Cl)cc1Cl
CC(C)(C)C1=CC(=C(O)C=C1)C(C)(C)C,CC(C)(C)c1ccc(O)c(C(C)(C)C)c1
CC1=CC=C(C)C(=C1)S(O)(=O)=O,Cc1ccc(C)c(S(=O)(=O)O)c1
CC1=CC(O)=C(C)C=C1,Cc1ccc(C)c(O)c1
CC(C)CCCC(C)(C)O,CC(C)CCCC(C)(C)O
CC1=CC=CC(C)=C1N,Cc1cccc(C)c1N
CC1=C(C=CC=C1[N+]([O-])=O)[N+]([O-])=O,Cc1c([N+](=O)[O-])cccc1[N+](=O)[O-]
OCC(O)CCl,OCC(O)CCl
CC1=C(Cl)C=C(N)C=C1,Cc1ccc(N)cc1Cl
CC1=CC(N)=CC=C1,Cc1cccc(N)c1
CC1=CC=CN=C1,Cc1cccnc1
OCC=CC1=CC=CC=C1,OCC=Cc1ccccc1
CC(CCOC(C)=O)CC(C)(C)C,CC(=O)OCCC(C)CC(C)(C)C
BrC1=CC=C(OC(=O)N2CCN3CCC2CC3)C=C1,O=C(Oc1ccc(Br)cc1)N1CCN2CCC1CC2
CCCCOC1=CC=C(N)C=C1,CCCCOc1ccc(N)cc1
NC1=CC=C(Cl)C=C1,Nc1ccc(Cl)cc1
OC1CCC(CC1)C2CCCCC2,OC1CCC(C2CCCCC2)CC1
CCC(=C(C1=CC=C(O)C=C1)C2=CC=C(OCCN(C)C)C=C2)C3=CC=CC=C3,CCC(=C(c1ccc(O)cc1)c1ccc(OCCN(C)C)cc1)c1ccccc1
COC1=CC=C(O)C=C1,COc1ccc(O)cc1
CN1CCOCC1,CN1CCOCC1
NCCCN1CCOCC1,NCCCN1CCOCC1
CCCC1=CC=C(N)C=C1,CCCc1ccc(N)cc1
C1=CC(=CC=N1)C2=CC=NC=C2,c1cc(-c2ccncc2)ccn1
ClC1=CC=C(C=C1)S(=O)(=O)C2=CC=C(Cl)C=C2,O=S(=O)(c1ccc(Cl)cc1)c1ccc(Cl)cc1
CCC1=CC(CC2=CC(CC)=C(N)C(CC)=C2)=CC(CC)=C1N,CCc1cc(Cc2cc(CC)c(N)c(CC)c2)cc(CC)c1N
CN1SC(Cl)=CC1=O,Cn1sc(Cl)cc1=O
NC1=NC2=C(NC=N2)C(=S)N1,Nc1nc2nc[nH]c2c(=S)[nH]1
C=CC#N,C=CC#N
C=CCN=C=S,C=CCN=C=S
CC1=CCC2CC1C2(C)C,CC1=CCC2CC1C2(C)C
CCC1(CCC(=O)NC1=O)C2=CC=C(N)C=C2,CCC1(c2ccc(N)cc2)CCC(=O)NC1=O
O[C@@H]([C@@H](O)CO)[C@@H](O)C=O,O=C[C@H](O)[C@@H](O)[C@@H](O)CO
OC(=O)C1CCN(CC1)C2=C(NC(=O)NC(=O)C3=CC(F)=C(F)C=C3Cl)C=C(F)C=C2,O=C(NC(=O)c1cc(F)c(F)cc1Cl)Nc1cc(F)ccc1N1CCC(C(=O)O)CC1
CC1COC2=C(C=CC=C2)N1C(=O)C(Cl)Cl,CC1COc2ccccc2N1C(=O)C(Cl)Cl
COC1=CC=C(C=C1)C2=COC3=C(C(O)=CC(O)=C3)C2=O,COc1ccc(-c2coc3cc(O)cc(O)c3c2=O)cc1
OC1=C(Br)C=C(C=C1Br)C2(OS(=O)(=O)C3=C2C=CC=C3)C4=CC(Br)=C(O)C(Br)=C4,O=S1(=O)OC(c2cc(Br)c(O)c(Br)c2)(c2cc(Br)c(O)c(Br)c2)c2ccccc21
CC(C)N1C(SCN(C1=O)C2=CC=CC=C2)=NC(C)(C)C,CC(C)N1C(=O)N(c2ccccc2)CSC1=NC(C)(C)C
CCC(C)NC1=C(C=C(C=C1[N+]([O-])=O)C(C)(C)C)[N+]([O-])=O,CCC(C)Nc1c([N+](=O)[O-])cc(C(C)(C)C)cc1[N+](=O)[O-]
CCCCOC(=O)[C@H](C)O,CCCCOC(=O)[C@H](C)O
CCCCOC(=O)C(C)O,CCCCOC(=O)C(C)O
CCCC[Sn](Cl)(Cl)Cl,CCCC[Sn](Cl)(Cl)Cl
[Na+].COC1=CC=C(C=C1)N=NC2=C(OC)C=C(N=NC3=CC=C(C=C3)S([O-])(=O)=O)C(C)=C2,COc1ccc(N=Nc2cc(C)c(N=Nc3ccc(S(=O)(=O)[O-])cc3)cc2OC)cc1
[Na+].OC1=C(N=NC2=CC=C(C=C2)S([O-])(=O)=O)C3=CC=CC=C3C=C1,O=S(=O)([O-])c1ccc(N=Nc2c(O)ccc3ccccc23)cc1
CN1C=NC2=C1C(=O)N(C)C(=O)N2C,Cn1c(=O)c2c(ncn2C)n(C)c1=O
CC(=O)C1=CC2=C(OC(C)(C)[C@H](O)[C@H]2NC(=O)C3=CC=C(F)C=C3)C=C1,CC(=O)c1ccc2c(c1)[C@H](NC(=O)c1ccc(F)cc1)[C@@H](O)C(C)(C)O2
COC1=CC(Cl)=C(OC)C=C1Cl,COc1cc(Cl)c(OC)cc1Cl
CC(C)=CCCC(C)=CC#N,CC(C)=CCCC(C)=CC#N
ClC1=CC=CC=C1C2=NN=C(N=N2)C3=C(Cl)C=CC=C3,Clc1ccccc1-c1nnc(-c2ccccc2Cl)nn1
NCCO.OC(=O)C1=C(Cl)C=CC(Cl)=N1,O=C(O)c1nc(Cl)ccc1Cl
O=C1OC2=C(C=CC=C2)C=C1,O=c1ccc2ccccc2o1
ClC=1C=C2N(C(=O)C(C2=CC1F)C(=O)C=3SC=C(Cl)C3)C(=O)N,NC(=O)N1C(=O)C(C(=O)c2cc(Cl)cs2)c2cc(F)c(Cl)cc21
COC1=C(CN[C@H]2CCCN[C@H]2C3=CC=CC=C3)C=C(OC(F)(F)F)C=C1,COc1ccc(OC(F)(F)F)cc1CN[C@H]1CCCN[C@H]1c1ccccc1
CC(C)(O)C1=COC(=C1)S(=O)(=O)NC(=O)NC2=C3CCCC3=CC4=C2CCC4,CC(C)(O)c1coc(S(=O)(=O)NC(=O)Nc2c3c(cc4c2CCC4)CCC3)c1
CNC(=O)[C@H]1O[C@H]([C@H](O)[C@@H]1N)N2C=NC3=C2N=CN=C3NCC4=CC(Cl)=CC=C4OCC5=CC(C)=NO5,CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(Cl)ccc4OCc4cc(C)no4)ncnc32)[C@H](O)[C@@H]1N
CC(C)(OO)C1=CC=CC=C1,CC(C)(OO)c1ccccc1
NC#N,N#CN
NC(=N)NC#N,N#CNC(=N)N
OC1=NC(O)=NC(O)=N1,Oc1nc(O)nc(O)n1
OC1CCCCC1,OC1CCCCC1
OC1CCCC1,OC1CCCC1
CC(C1CC1)C(O)(CN2C=NC=N2)C3=CC=C(Cl)C=C3,CC(C1CC1)C(O)(Cn1cncn1)c1ccc(Cl)cc1
CC(=O)O[C@@]1(CC[C@H]2[C@@H]3C=C(Cl)C4=CC(=O)[C@@H]5C[C@@H]5[C@]4(C)[C@H]3CC[C@]12C)C(C)=O,CC(=O)O[C@]1(C(C)=O)CC[C@H]2[C@@H]3C=C(Cl)C4=CC(=O)[C@@H]5C[C@@H]5[C@]4(C)[C@H]3CC[C@@]21C
OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO,OC[C@@H](O)[C@@H](O)[C@H](O)[C@@H](O)CO
OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO,OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO
O[C@@H]([C@H](O)CO)[C@@H](O)C=O,O=C[C@H](O)[C@@H](O)[C@H](O)CO
CCCCCCCCCCO[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O,CCCCCCCCCCO[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O
BrC(Br)C#N,N#CC(Br)Br
ClC1=C(Cl)C(=O)C2=C(C=CC=C2)C1=O,O=C1C(Cl)=C(Cl)C(=O)c2ccccc21
OC(=O)C(Cl)Cl,O=C(O)C(Cl)Cl
COP(=O)(OC)O/C(/C)=C/C(=O)N(C)C,COP(=O)(OC)O/C(C)=C/C(=O)N(C)C
CCOC(=O)CC(=O)OCC,CCOC(=O)CC(=O)OCC
COCCOCCOC,COCCOCCOC
COC1=C(OC)C=C(C=C1)C(=CC(=O)N2CCOCC2)C3=CC=C(Cl)C=C3,COc1ccc(C(=CC(=O)N2CCOCC2)c2ccc(Cl)cc2)cc1OC
COP(=O)OC,CO[PH](=O)OC
CNC,CNC
CCCCCCCCOC(=O)CCCCCCCCC(=O)OCCCCCCCC,CCCCCCCCOC(=O)CCCCCCCCC(=O)OCCCCCCCC
CN(C)C(=O)C(C1=CC=CC=C1)C2=CC=CC=C2,CN(C)C(=O)C(c1ccccc1)c1ccccc1
[Hg](C1=CC=CC=C1)C2=CC=CC=C2,c1ccc([Hg]c2ccccc2)cc1
C[Si](C)(O[Si](C)(C)C=C)C=C,C=C[Si](C)(C)O[Si](C)(C)C=C
CC(C)[C@@H]1CC[C@@H](C)C[C@H]1O,CC(C)[C@@H]1CC[C@@H](C)C[C@H]1O
CCCCCCCCCCCC(O)=O,CCCCCCCCCCCC(=O)O
O=C(OCC1CCC2OC2C1)C3CCC4OC4C3,O=C(OCC1CCC2OC2C1)C1CCC2OC2C1
CN([C@H]1CC[C@@]2(CCCO2)C[C@@H]1N3CCCC3)C(=O)CC4=CC=CC5=C4C=CO5,CN(C(=O)Cc1cccc2occc12)[C@H]1CC[C@@]2(CCCO2)C[C@@H]1N1CCCC1
CCCN(CCC)C(=O)SCC,CCCN(CCC)C(=O)SCC
CCOP(=S)(OCC)SCSP(=S)(OCC)OCC,CCOP(=S)(OCC)SCSP(=S)(OCC)OCC
CCCSP(=O)(OCC)SCCC,CCCSP(=O)(OCC)SCCC
CCOC(=O)C1=CC=C(C)C=C1,CCOC(=O)c1ccc(C)cc1
CCOC(=O)C1=NOC(C1)(C2=CC=CC=C2)C3=CC=CC=C3,CCOC(=O)C1=NOC(c2ccccc2)(c2ccccc2)C1
CCCC(=O)OCC,CCCC(=O)OCC
CCOC(=O)C1OC1(C)C2=CC=CC=C2,CCOC(=O)C1OC1(C)c1ccccc1
[Na+].[Fe+3].[O-]C(=O)CN(CCN(CC([O-])=O)CC([O-])=O)CC([O-])=O,O=C([O-])CN(CCN(CC(=O)[O-])CC(=O)[O-])CC(=O)[O-]
CC1(OC(=O)N(NC2=CC=CC=C2)C1=O)C3=CC=C(OC4=CC=CC=C4)C=C3,CC1(c2ccc(Oc3ccccc3)cc2)OC(=O)N(Nc2ccccc2)C1=O
CCOC(=O)C(C)OC1=CC=C(OC2=NC3=C(O2)C=C(Cl)C=C3)C=C1,CCOC(=O)C(C)Oc1ccc(Oc2nc3ccc(Cl)cc3o2)cc1
CCOC(=O)[C@@H](C)OC1=CC=C(OC2=NC3=C(O2)C=C(Cl)C=C3)C=C1,CCOC(=O)[C@@H](C)Oc1ccc(Oc2nc3ccc(Cl)cc3o2)cc1
CC1(C)C(C(=O)OC(C#N)C2=CC=CC(OC3=CC=CC=C3)=C2)C1(C)C,CC1(C)C(C(=O)OC(C#N)c2cccc(Oc3ccccc3)c2)C1(C)C
CN1N=C(C)C(C=NOCC2=CC=C(C=C2)C(=O)OC(C)(C)C)=C1OC3=CC=CC=C3,Cc1nn(C)c(Oc2ccccc2)c1C=NOCc1ccc(C(=O)OC(C)(C)C)cc1
[Na+].COC1=NN(C(=O)[N-]S(=O)(=O)C2=C(OC(F)(F)F)C=CC=C2)C(=O)N1C,COc1nn(C(=O)[N-]S(=O)(=O)c2ccccc2OC(F)(F)F)c(=O)n1C
Cl.CNC(=O)OC1=CC=CC(=C1)N=CN(C)C,CNC(=O)Oc1cccc(N=CN(C)C)c1
O=CCCCC=O,O=CCCCC=O
C[Si](C)(C)N[Si](C)(C)C,C[Si](C)(C)N[Si](C)(C)C
ClC1=CC(Cl)=C(C=C1)C(CN2C=CN=C2)OCC=C,C=CCOC(Cn1ccnc1)c1ccc(Cl)cc1Cl
COCC1=CN=C(C2=NC(C)(C(C)C)C(=O)N2)C(=C1)C(O)=O,COCc1cnc(C2=NC(C)(C(C)C)C(=O)N2)c(C(=O)O)c1
CC(C)C1CCC(CC2=CC=C(Cl)C=C2)C1(O)CN3C=NC=N3,CC(C)C1CCC(Cc2ccc(Cl)cc2)C1(O)Cn1cncn1
CC(C)NC(=O)N1CC(=O)N(C1=O)C2=CC(Cl)=CC(Cl)=C2,CC(C)NC(=O)N1CC(=O)N(c2cc(Cl)cc(Cl)c2)C1=O
CC(C)CCOC(=O)C1=CC=CC=C1,CC(C)CCOC(=O)c1ccccc1
CC1=CC(=O)CC(C)(C)C1,CC1=CC(=O)CC(C)(C)C1
CCCN(CCC)C1=C(C=C(C=C1[N+]([O-])=O)C(C)C)[N+]([O-])=O,CCCN(CCC)c1c([N+](=O)[O-])cc(C(C)C)cc1[N+](=O)[O-]
CS(=O)(=O)C1=C(C=CC(=C1)C(F)(F)F)C(=O)C2=C(ON=C2)C3CC3,CS(=O)(=O)c1cc(C(F)(F)F)ccc1C(=O)c1cnoc1C1CC1
OC[C@H](O)[C@H](O)[C@@H](O)[C@H](O)CO,OC[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)CO
CCCCCCCCCCCCOC(=O)C1=CC(O)=C(O)C(O)=C1,CCCCCCCCCCCCOC(=O)c1cc(O)c(O)c(O)c1
CC(=C)C1CCC(C)=CC1,C=C(C)C1CC=C(C)CC1
COC1=CC2=C(NC=C2CCNC(C)=O)C=C1,COc1ccc2[nH]cc(CCNC(C)=O)c2c1
CC(C)C1CCC(C)CC1O,CC1CCC(C(C)C)C(O)C1
[Na+].CNC([S-])=S,CNC(=S)[S-]
SC(=S)NC.[Na].O,CNC(=S)S
COC(=O)C1=C(C)C=CC=C1,COC(=O)c1ccccc1C
CCCCC/C=C\C/C=C\CCCCCCCC(=O)OC,CCCCC/C=C\C/C=C\CCCCCCCC(=O)OC
N#CSCSC#N,N#CSCSC#N
COC(=O)C=C(C)OP(=O)(OC)OC,COC(=O)C=C(C)OP(=O)(OC)OC
OC(=O)C1=C(C=CC=C1)C(=O)OCC2=CC=CC=C2,O=C(O)c1ccccc1C(=O)OCc1ccccc1
CCCCC(CN1C=NC=N1)(C#N)C2=CC=C(Cl)C=C2,CCCCC(C#N)(Cn1cncn1)c1ccc(Cl)cc1
O=C1N(SC2CCCCC2)C(=O)C3=C1C=CC=C3,O=C1c2ccccc2C(=O)N1SC1CCCCC1
CCCCNS(=O)(=O)C1=CC=C(C)C=C1,CCCCNS(=O)(=O)c1ccc(C)cc1
CCNC1=CC(C)=CC=C1,CCNc1cccc(C)c1
O=NN1CCCC1,O=NN1CCCC1
[Cl-].CCCC[N+](C)(CCCC)CCCC,CCCC[N+](C)(CCCC)CCCC
CN(C)C(C)=O,CC(=O)N(C)C
CN(C)C1=CC=CC=C1,CN(C)c1ccccc1
NC(=O)C1=CC=CN=C1,NC(=O)c1cccnc1
CO[C@@H]1[C@@H](CC[C@]2(CO2)[C@H]1[C@@]3(C)O[C@@H]3CC=C(C)C)OC(=O)NC(=O)CCl,CO[C@@H]1[C@H](OC(=O)NC(=O)CCl)CC[C@]2(CO2)[C@H]1[C@@]1(C)O[C@@H]1CC=C(C)C
C[Si](C)(C)O[Si](C)(C)O[Si](C)(C)C,C[Si](C)(C)O[Si](C)(C)O[Si](C)(C)C
Cl.COC1=CC=C(N)C=C1,COc1ccc(N)cc1
CC1=CC=C(O)C=C1,Cc1ccc(O)cc1
CCCCN(CC)C(=O)SCCC,CCCCN(CC)C(=O)SCCC
CC1=C2N=C(C3=CC=CC=C3Cl)C4=C(NC2=NN1)C=CC(=C4)[N+]([O-])=O,Cc1[nH]nc2c1N=C(c1ccccc1Cl)c1cc([N+](=O)[O-])ccc1N2
[Na+].FC1=CC=C(C(=O)[N-]S(=O)(=O)/C=C/C2=CC=CC=C2)C(Cl)=C1,O=C([N-]S(=O)(=O)/C=C/c1ccccc1)c1ccc(F)cc1Cl
[K+].CCCCC(CC)C([O-])=O,CCCCC(CC)C(=O)[O-]
C[C@]12CC(=O)[C@H]3[C@@H](CCC4=CC(=O)C=C[C@]34C)[C@@H]1CC[C@]2(O)C(=O)CO,C[C@]12C=CC(=O)C=C1CC[C@@H]1[C@@H]2C(=O)C[C@@]2(C)[C@H]1CC[C@]2(O)C(=O)CO
CCCN(CCC)C1=C(C(N)=C(C=C1[N+]([O-])=O)C(F)(F)F)[N+]([O-])=O,CCCN(CCC)c1c([N+](=O)[O-])cc(C(F)(F)F)c(N)c1[N+](=O)[O-]
CC(C)N(C(=O)CCl)C1=CC=CC=C1,CC(C)N(C(=O)CCl)c1ccccc1
[Na+].CCCOC1=NN(C(=O)[N-]S(=O)(=O)C2=C(C=CC=C2)C(=O)OC)C(=O)N1C,CCCOc1nn(C(=O)[N-]S(=O)(=O)c2ccccc2C(=O)OC)c(=O)n1C
Cl.CC1=NC=C(CO)C(CO)=C1O,Cc1ncc(CO)c(CO)c1O
O=C1NS(=O)(=O)C2=C1C=CC=C2,O=C1NS(=O)(=O)c2ccccc21
C=CCC1=CC=C2OCOC2=C1,C=CCc1ccc2c(c1)OCO2
CCNC1=NC(NCC)=NC(Cl)=N1,CCNc1nc(Cl)nc(NCC)n1
[Na+].[O-]C1=CC=C(C=C1)[N+]([O-])=O,O=[N+]([O-])c1ccc([O-])cc1
[Na+].CCCCCCCCCOS([O-])(=O)=O,CCCCCCCCCOS(=O)(=O)[O-]
O.[Na+].O=C1[N-]S(=O)(=O)C2=CC=CC=C12,O=C1[N-]S(=O)(=O)c2ccccc21
CCCCCCCC/C=C\CCCCCCCC(=O)OC[C@@H](O)[C@H]1OC[C@H](O)[C@H]1O,CCCCCCCC/C=C\CCCCCCCC(=O)OC[C@@H](O)[C@H]1OC[C@H](O)[C@H]1O
CC1=CC(C)=C(C2=C(OC(=O)CC(C)(C)C)C3(CCCC3)OC2=O)C(C)=C1,Cc1cc(C)c(C2=C(OC(=O)CC(C)(C)C)C3(CCCC3)OC2=O)c(C)c1
C1OC1C2=CC=CC=C2,c1ccc(C2CO2)cc1
OC(=O)C1=CC(=CC=C1O)N=NC2=CC=C(C=C2)S(=O)(=O)NC3=NC=CC=C3,O=C(O)c1cc(N=Nc2ccc(S(=O)(=O)Nc3ccccn3)cc2)ccc1O
CC1=NN(C(=O)N1C(F)F)C2=CC(NS(C)(=O)=O)=C(Cl)C=C2Cl,Cc1nn(-c2cc(NS(C)(=O)=O)c(Cl)cc2Cl)c(=O)n1C(F)F
CCOP(=S)(OC(C)C)OC1=CN=C(N=C1)C(C)(C)C,CCOP(=S)(Oc1cnc(C(C)(C)C)nc1)OC(C)C
CC1=C(F)C(F)=C(COC(=O)C2C(/C=C(\Cl)/C(F)(F)F)C2(C)C)C(F)=C1F,Cc1c(F)c(F)c(COC(=O)C2C(/C=C(\Cl)C(F)(F)F)C2(C)C)c(F)c1F
OC(=O)CS,O=C(O)CS
NC(N)=S,NC(N)=S
CC(C)N(C(C)C)C(=O)SCC(Cl)=C(Cl)Cl,CC(C)N(C(=O)SCC(Cl)=C(Cl)Cl)C(C)C
CCCCOCCOC(=O)COC1=C(Cl)C=C(Cl)C(Cl)=N1,CCCCOCCOC(=O)COc1nc(Cl)c(Cl)cc1Cl
CCO[Si](C)(OCC)OCC,CCO[Si](C)(OCC)OCC
COCCOCCOCCOC,COCCOCCOCCOC
OC(=O)C1=CC=C2C(=O)OC(=O)C2=C1,O=C(O)c1ccc2c(c1)C(=O)OC2=O
CCCCC(CC)COC(=O)C1=CC=C(C(=O)OCC(CC)CCCC)C(=C1)C(=O)OCC(CC)CCCC,CCCCC(CC)COC(=O)c1ccc(C(=O)OCC(CC)CCCC)c(C(=O)OCC(CC)CCCC)c1
CCCC(CCC)C(O)=O,CCCC(CCC)C(=O)O
CCCSC(=O)N(CCC)CCC,CCCSC(=O)N(CCC)CCC
FC1(F)/C(=C\C(=O)N2CCC(N3CCCCC3)CC2)/C=4C(N(CC1)C(=O)C5=CC=C(NC(=O)C6=C(OC=C6)C)C=C5)=CC=CC4.FC1(F)/C(=C\C(=O)N2CCC(N3CCCCC3)CC2)/C=4C(N(CC1)C(=O)C5=CC=C(NC(=O)C6=C(OC=C6)C)C=C5)=CC=CC4.OC(=O)/C=C/C(O)=O,Cc1occc1C(=O)Nc1ccc(C(=O)N2CCC(F)(F)/C(=C\C(=O)N3CCC(N4CCCCC4)CC3)c3ccccc32)cc1
Binary file added data/test.csv.xz
Binary file not shown.
Binary file added data/train.csv.xz
Binary file not shown.
Binary file added data/validation.csv.xz
Binary file not shown.
Binary file added feature_importance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ranking.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 47a85ab

Please sign in to comment.