Capstone project At Devnation
We have to download and pre-process biological activity data from the ChEMBL database that you can use to perform Computational Drug Discovery. The dataset is comprised of compounds (molecules) that have been biologically tested for their activity towards the target organism/protein of interest
The purpose of this project is to build a relationship between the structural and biological activity of a molecule. exploratory data analysis will help to find differences between the active and inactive sets of compounds. We will build a regression model for predicting the pIC50 values.
This drug designing project will build a Bioinformatics tool that will allow users the ability to predict whether a compound of interest has favorable biological activity against the target protein or not.
- Python (ID Pycharm)
- PADEL-Descriptor software(molecular descriptors)
- Sciket learn (ML model)
- Streamlit (for deployment)
- Data Collection and Preprocessing
- Exploratory Data Analysis
- Dataset Preparation
- Model Building
- Compare Models
- Deploy Model as Web App) | Streamlit
We will use the SMILES notation (representing the unique chemical structure of compounds) to compute molecular descriptors. The descriptors that we will be computing are Lipinski's descriptors (molecular weight, LogP, number of hydrogen bond donors, and number of hydrogen bond acceptors). We will perform exploratory data analysis by making simple box plots and scatter plots to discern differences of the active and inactive sets of compounds and also prepare the dataset (X and Y data frames) that will be used in the next part for Model Building.
In this project I will use the data from different databases like ChEMBL