Training data --> Train Chemprop model --> Predict/Screen new molecules --> Dock molecules to HIV-1 proteins --> Predict binding affinity
It has 9 genes and encodes for 15 proteins
Further, these 15 proteins can be classified into 4 types:
- Structural --> 6 (gp120, gp41, Matrix p17, Capsid p24, Nucleocapsid p7)
- Enzymes --> 3 (Protease, Reverse Transcriptase, Integrase)
- Accessory --> 4 (Vif, Vpr, Vpu, Nef)
- Regulatory --> 2 (Tat, Rev)
Also, protein length in the number of amino acids:
- Reverse Transcriptase (p66 / p51 subunit) --> 1000 aa (560 aa/ 440 aa)
- Env (gp120 / gp 41) --> 835 aa (560 aa/ 440 aa)
- Integrase --> 288 aa
- Capsid --> 231 aa
- Nef --> 206 aa
- Protease --> 198 aa
- Vif --> 192 aa
- Matrix --> 132 aa
- Rev --> 116 aa
- Vpr --> 96 aa
- Tat --> (86 - 101) aa
- Vpu --> 81 aa
- Nucleocapsid --> 55 aa
- p6 --> 52 aa
More details of HIV Structural protein: gp120 --> surface glycoprotein gp41 --> transmembrane glycoprotein
Each pair of gp120 and gp41 proteins forms a trimer using a noncovalent bond. The envelope protein helps in the entry of the virus into the host cell. The gp120 protein has 2 regions namely the cd4 binding region and the v3 region. The gp120 subunit is also a key target for the host immune responses against HIV. As shown here, the outer surface of gp120 is heavily glycosylated. This glycan shield effectively cloaks many of the critical immune determinants that are on the surface of gp120, which helps HIV to evade host immune responses.
HIV has three enzymes namely protease, reverse transcriptase, and integrase. Each of these enzymes plays a critical role in HIV replication.
--> Expression of the viral gene products
--> Production of virus particles
--> Infection of a new cell
--> reintegration of the viral genome
step 1: Viral transcripts are expressed from the promotor located in the 5' long terminal repeat (LTR), with Tat greatly enhancing the rate of transcription.
step 2: a set of spliced and genomic-length RNAs are transported from the nucleus to the cytoplasm, where they can be translated or packaged. This step is regulated by Rev.
step 3: viral mRNAs are translated in the cytoplasm, and the Gag and Gag-Pol polyproteins become localized to the cell membrane. the Env mRNA is the translated at the endoplasmic reticulum (ER).
step 4: the core
Tat: It is an essential protein that plays a crucial role in transcription factors. it increases the production of viral mRNAs ~ 100-fold and consequently is essential for viral replication.
Tat binds not to a DNA site but rather to an RNA hairpin known as TAR (trans-activating response element), which is present at the 5' end of the nascent viral transcripts.
86 amino acids Tat protein has suggested that a hydrophobic core region of about 10 amino acids adopts a defined structure but the rest of the molecule, including the arginine-rich RNA-binding domain, is relatively disordered.
It seems likely that Tat requires interactions with cellular proteins in addition to TAR to adopt a stable structure.
References:
- Mini-Lectures: Introduction to the Biology of HIV-1 https://www.youtube.com/watch?v=XP8tMBm8zbY
- Mini-Lecture Series: Overview of the HIV Life Cycle https://www.youtube.com/watch?v=-sNjOZL_Sbg
AutoDock Vina documentation: https://autodock-vina.readthedocs.io/_/downloads/en/latest/pdf/
3D Printed Bioreactors Enable Cell Growth: https://www.youtube.com/watch?v=CmLvJ_Yxs9E
Molecular Docking | Autodock VINA Virtual Screening | VINA Docking tutorial | Bioinformatics: https://www.youtube.com/watch?v=tFFxNTvvoJI
Another video on Autodock: https://www.youtube.com/watch?v=2ewpDYW081Y
Autodock Docking of molecules: https://youtu.be/rrBsAzq0Fwg?si=d9jmJHKs_MA4zz1H https://youtu.be/_1zSS1bnHFk?si=mx_NULYeSzJVPeDh https://youtu.be/TyzjCUfPQj0?si=8ZV8E8nkIXlJpwVk https://youtu.be/dhs0VOeCo_s?si=0yIdmrvb_0EdQxc8
https://www.csb.pitt.edu/people/faculty/david-koes/ https://en.wikipedia.org/wiki/Pharmacophore https://pharmit.csb.pitt.edu/
Autodock Vina: https://github.com/ccsb-scripps/AutoDock-Vina
GNINA: https://www.youtube.com/watch?v=7w3SHcdn0gY
Another video on molecular docking: https://www.youtube.com/watch?v=UFuVL_0Tpi0
Part-2: https://www.youtube.com/watch?v=5SwSbb52PTU
More about protein can be read here: https://en.wikipedia.org/wiki/Structure_and_genome_of_HIV
https://www.ncbi.nlm.nih.gov/books/NBK513308/ https://github.com/gnina/gnina
https://github.com/deepchem/deepchem/tree/master?tab=readme-ov-file https://deepchem.io/tutorials/modeling-protein-ligand-interactions/
https://www.youtube.com/watch?v=AHVJv5RNqKs&t=870s
https://moleculenet.org/datasets-1 https://github.com/deepchem/deepchem/blob/master/examples/tutorials/DeepChemXAlphafold.ipynb https://github.com/chemprop/chemprop/tree/master https://chemprop.readthedocs.io/en/latest/args.html#train-arguments
https://journals.lww.com/jaids/abstract/1999/09010/nitric_oxide_modulates_hiv_1_replication.1.aspx
https://en.wikipedia.org/wiki/Abacavir https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102085/#:~:text=The%20increased%20production%20of%20cytokines,as%20the%20brain%20and%20lung. https://en.wikipedia.org/wiki/Valinomycin https://en.wikipedia.org/wiki/Chlorprothixene https://en.wikipedia.org/wiki/Foscarnet
https://en.wikipedia.org/wiki/HIV https://en.wikipedia.org/wiki/Antiviral_drug https://go.drugbank.com/structures/small_molecule_drugs/DB04542 https://go.drugbank.com/drugs/DB03150 https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/2DT https://zinc15.docking.org/substances/ZINC000169289767/ https://www.drugs.com/promethazine.html
https://www.cell.com/cell/fulltext/S0092-8674(20)30102-1#secsectitle0060
https://www.youtube.com/watch?v=U8rHBrdHBDE&t=147s
Zinc 15: https://www.youtube.com/watch?v=YnLMG6MHoYE
https://alphafold.ebi.ac.uk/entry/A0A0M3KKW8
https://en.wikipedia.org/wiki/Enfuvirtide https://en.wikipedia.org/wiki/Ritonavir https://www.webmd.com/hiv-aids/aids-hiv-medication
https://zinc15.docking.org/tranches/home/
Hiv 1 gp 120 protein structure: https://www.uniprot.org/uniprotkb/A0A0M3KKW8/entry https://alphafold.ebi.ac.uk/entry/A0A0M3KKW8 https://www.ebi.ac.uk/pdbe/entry/pdb/8F9Z
Different drugs for HIV 1: https://www.webmd.com/hiv-aids/aids-hiv-medication
6 types of drugs are used for HIV therapy. It primarily works by stopping viral multiplication. https://www.ncbi.nlm.nih.gov/books/NBK513308/
HIV type 1: Protein list:
Mechanism of HIV1:
https://www.annualreviews.org/doi/pdf/10.1146/annurev.biochem.67.1.1
https://en.wikipedia.org/wiki/HIV-1_protease https://en.wikipedia.org/wiki/Ritonavir
HIV1 protease protein structure: https://alphafold.ebi.ac.uk/entry/A0A0M3KKW8
About ZINC dataset: https://www.youtube.com/watch?v=YnLMG6MHoYE About AlphaFold: https://www.ebi.ac.uk/training/online/courses/alphafold/
From the zinc dataset, I selected two compounds, one is new: https://zinc15.docking.org/substances/ZINC000013597823/
Today, I was able to work with the model. So, Firstly I train model on HIV dataset to check model’s ability to classify molecules against HIV replication inhibition. I started with classic HIV dataset from the moleculeent. The dataset contains around 40 thousand molecules tested against the HIV virus (more can be found about that). Now, first time i trained on the whole dataset however, the result I got was not great. I certainly can’t say whether it was because of skewed dataset (Most of the molecules was not effective with 0 value in the dataset) or either it could be because of bad parameter choice.
However to keep things simple, I, first separated molecules that was active against the virus which is around <1450 molecules. Similarly, I separated non active molecules. Now, to make the dataset balanced I randomly selected 1500 molecules and trained the model on that.
I think the model worked surprisingly well against classifying the molecules against the virus. I found certain molecules that was approved drug for the HIV therapy, the molecules was not included in the training dataset. (I just matched the molecules, however, I didn’t match different representation of smiles in the training dataset).
Interesting find: Out of five times training with different dataset (remember randomly choosing inactive 1500 molecules). N2O nitrous-oxide being as effective against the HIV as Stavudine and Zidovudine! crazy! Right? Later I found a paper on effect of it on the HIV replication [ Tilted: Nitric Oxide Modulates HIV-1 Replication ] I haven’t gone through the study let’s see what they find, although, study was done in 1999!
https://drugs.ncats.io/drug/776E09GMHQ https://www.google.com/search?q=cefuroxime&sourceid=chrome&ie=UTF-8
Learning parameters: https://www.cell.com/cell/fulltext/S0092-8674(20)30102-1?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867420301021%3Fshowall%3Dtrue#secsectitle0030
Smiles representation of molecules: https://archive.epa.gov/med/med_archive_03/web/html/smiles.html
HIV list of antiviral therapy (List of drugs): https://www.ncbi.nlm.nih.gov/books/NBK513308/
All predicted drug active against HIV inhibition with probability with more than > 0.9 https://go.drugbank.com/drugs/DB00552 https://en.wikipedia.org/wiki/Nitrous_oxide https://moleculenet.org/datasets-1 https://en.wikipedia.org/wiki/Zidovudine https://en.wikipedia.org/wiki/Stavudine
https://zinc15.docking.org/substances/ZINC000169289767/ https://en.wikipedia.org/wiki/Trypan_blue https://en.wikipedia.org/wiki/Floxuridine
https://en.wikipedia.org/wiki/Sulfasalazine https://en.wikipedia.org/wiki/Zidovudine
https://pubchem.ncbi.nlm.nih.gov/compound/Rolitetracycline
Effect of NO (Nitrous Oxide) on HIV replication: https://journals.lww.com/jaids/abstract/1999/09010/nitric_oxide_modulates_hiv_1_replication.1.aspx#:~:text=Summary%3A,NO%20on%20HIV%2D1%20replication.
https://zinc15.docking.org/substances/ZINC000169289767/
https://en.wikipedia.org/wiki/HIV
https://www.science.org/doi/10.1126/science.adl2528 https://github.com/baker-laboratory/RoseTTAFold-All-Atom
(It was predicted by the model) https://en.wikipedia.org/wiki/Abacavir
Message Passing Neural Network: https://www.youtube.com/watch?v=ZgEMv-t1B_g