This project focuses on analyzing HIV-1 protease sequences to study variability, conservation, and drug resistance mutations (DRMs).
It was developed as part of my bioinformatics + machine learning learning journey (B.Tech Biotechnology @ NIT Raipur).
- Multiple Sequence Alignment (MSA) of HIV protease sequences
- Identification of conserved vs variable positions
- Overlay of known Drug Resistance Mutations (DRMs)
- Visualization with sequence logos and conservation plots
- Mutation frequency table generation for key positions
- Calculated per-position conservation scores.
- Plotted conservation heatmaps & sequence logos.
- Mapped known Drug Resistance Mutations (DRMs) from Stanford HIVDB.
- Highlighted variable sites overlapping with DRMs.
- Generated frequency table of amino acid substitutions at key residues.
- Identified highly conserved catalytic motifs.
- Highlighted hotspot residues linked to drug resistance.
- Visualizations provide insight into mutation patterns in HIV protease.
- Python (Biopython, Pandas, Matplotlib, Seaborn)
- MSA tools (MAFFT/Clustal Omega)
- Expand to machine learning models for predicting drug resistance.
- Incorporate phylogenetic tree analysis for evolutionary tracking.
- Automate into a lightweight web dashboard (MERN + ML).
👤 Shubham Thakur
B.Tech Biotechnology, NIT Raipur
Exploring Bioinformatics | AI in Healthcare | Full-Stack Development
This project showcases my bioinformatics + coding skills, and is intended for research learning purposes.





