This repository contains the Jupyter Notebooks used for data processing and analysis for the article "The role of epistasis in amikacin, kanamycin, bedaquiline, and clofazimine resistance in Mycobacterium tuberculosis complex" (Vargas et al. 2021, https://journals.asm.org/doi/full/10.1128/aac.01164-21).
Notebooks were run in order from top to bottom as listed below. All code was written in Python 2 and running code within these notebooks requires installing the necessary python packages, bioinformatics pipelines & changing the directory paths within the notebooks.
Note: If a notebook doesn't render on GitHub, you can view it by pasting the GitHub hyperlink to it here https://nbviewer.jupyter.org/
Notebooks
(A) Filter and Process 12 isolates with eis C-14T mutation and AG MICs
(B) eis promoter & eis homoplasy visualization and 2.2.1.1.1.i3 cluster identification
(C) Phylogeny Construction for sub-lineage 2.2.1.1.1.i3 cluster
(C) Phylogeny Construction for sub-lineage 4.11
(D) ahpC and ahpC promoter analysis
(D) eis and eis promoter analysis
(D) eis and whiB7 promoter analysis
(D) mmpR mmpL5 mmpS5 frameshift indel analysis
(D) whiB7 and whiB7 promoter analysis
(E) Wrangle Genotypes Matrix for Insertions and Deletions with Mixed Allele Freqs
(F) Calculate Number of Isolates with 100x Coverage for Specific Loci
(G) mmpR mmpL5 mmpS5 eis whiB7 ahpC frameshift mixed indel analysis
(H) Check for atpE, rrs mutations in specific strains and analyze co-occurrence of eis promoter mutations with rrs AG resistance mutations
(I) Gene Regulator Regulated Schematic Figs
(J) Find frequency of Leu-Ile Start Codons & Drug Phenotypes for 4.11 isolates and eis promoter-eis double mutants
(K) Convergent Evolution Data - SNV and INDEL homoplasy count from SNPPar & TopDis
(L) Geographical & Lineage Distribution and Accession ID wrangle for Samples