Predicting the half maximal inhibitory concentration of various drugs on tyrosine protein kinase receptor FLT3 using machine learning model

Machine Learning approaches provides a set of tool that can improve drug discovery and decision making for well defined questions with abundant, high quality data. Interpretation of model wil allow us to understand, How we can design a better drug.

Machine learning is a working horse of modern drug discovery and has been ever since the early days of QSAR.

DATA COLLECTION

The data is downloaded from chEMBL(ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs) using chembl_webresource_client library. The library is developed and supported by chEMBL group. The library help accessing chEMBL data. The dataset is comprised of compounds that have been biologically tested for their activity towards target.

Here is a flowchart for dataset preparation :-

Labeling the compounds as Active/Inactive/Intermediate

Compunds are being labeled(Active\Inactive\Intermediate) based on their potency value (IC50 is half maximal Inhibitory concentration. Its is the most widely used and informative measure of a drugs efficacy. It indicates how much drug is needed to inhibit a biological process by half, thus providing a measure of potency of an antagonist drug in pharmacological research.) compounds having values < 1000 nM will be considered active , Those greater than 10000 nM will be considered to be inactive. A function is created to label the molecules present in dataset.

Converting IC50 values to pIC50

The nature of potency values is logarithmic.If you look at dose-response curves, they are sigmoidal when you plot them in logarithmic space.

Using pIC50 is the proper way to think about the data. If your potency goes down because you've gone from micromolar to nanomolar, that’s an exponential change, not a linear change. pIC50 is really the right way to think about potency of compounds. A function is created to convert IC50 values to logarithmic values.

Training model :

features = Molecular Fingerprint(calculated using paDEL for all molecules present in Dataset)

label= pIC50 values

model= Random Forest regressor

Model Evaluation :

r^2 score = 0.77

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
1_Data_collection.ipynb		1_Data_collection.ipynb
2_Data_processing.ipynb		2_Data_processing.ipynb
3_model_building.ipynb		3_model_building.ipynb
About.docx		About.docx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting the half maximal inhibitory concentration of various drugs on tyrosine protein kinase receptor FLT3 using machine learning model

Machine learning is a working horse of modern drug discovery and has been ever since the early days of QSAR.

DATA COLLECTION

Here is a flowchart for dataset preparation :-

Labeling the compounds as Active/Inactive/Intermediate

Converting IC50 values to pIC50

Training model :

features = Molecular Fingerprint(calculated using paDEL for all molecules present in Dataset)

label= pIC50 values

model= Random Forest regressor

Model Evaluation :

About

Releases

Packages

Languages

License

J22160/ML_DRUG_DISCOVERY

Folders and files

Latest commit

History

Repository files navigation

Predicting the half maximal inhibitory concentration of various drugs on tyrosine protein kinase receptor FLT3 using machine learning model

Machine learning is a working horse of modern drug discovery and has been ever since the early days of QSAR.

DATA COLLECTION

Here is a flowchart for dataset preparation :-

Labeling the compounds as Active/Inactive/Intermediate

Converting IC50 values to pIC50

Training model :

features = Molecular Fingerprint(calculated using paDEL for all molecules present in Dataset)

label= pIC50 values

model= Random Forest regressor

Model Evaluation :

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages