Code for computational workflows and analyses relating to "Computational prediction of MHC anchor locations guide neoantigen identification and prioritization"
- Initial_peptide_database.ipynb
- Combining all input data and selecting HLA alleles and their corresponding strong binding peptides
- Saturation_analysis.ipynb
- Saturation analysis using HLA-A*02:01
- fasta_generator.py
- Generating FASTA files for input into pVACbind
- pvacbind_run.sh
- Running pVACbind in parallel
- Anchor Position Calculation.ipynb
- Collecting pVACbind results and calculating anchor probabilities
- Anchor_cluster_analysis.ipynb
- Summarizing anchor trends using hierarchical clustering and heatmaps
- Validation_pMHC_crystallography_analysis.ipynb
- Use of Mdtraj package to calculate distance and SASA for peptide-MHC pdb structures
- Comparisons of predictions from structure data to our own predictions
- TCR validation data analysis.ipynb
- Repeat of the structure analysis using TCR-peptide-MHC pdb structures
- Impact Analysis TCGA samples.ipynb
- Selection of a balanced HLA population from remaining TCGA samples
- Generating FASTA files and running pVACbind
- Objective determination of anchor locations
- Analyzing the entire cohort using three different filters (no anchor, conventional anchor and allele-specific anchor)
- Impact analysis using different binding cutoffs.ipynb
- Repeating analysis using different binding cutoffs and inclusion criteria
- Generation of experimental validation candidates.ipynb
- Anchor calculation performed for all good binding candidates
- Selecting peptides for experimental validation
- Prioritization of mutations and positions for validation experiments
- Validation Plots.ipynb
- Evaluation of in vitro and in vivo experimental results
- Comparison between seed dataset and other random peptide sets.ipynb
- Evaluating seed peptide source by generating random peptide sequences from 3 different sources and repeating the analysis
- Reviewer response analysis (HLA distribution).ipynb
- Bias analysis for HLA allele specific anchor patterns
- Reviewer response - Scenario count.ipynb
- Determining how many SNVs fell into each scenario
- For researchers wanting to incorporate our end results into their pipelines:
- For researchers looking to expand this database for particular HLA alleles, we recommend the following steps:
- Identify strong binding peptides for the HLA allele(s) and peptide length(s) of interest.
- Generate a dictionary of peptides where each position is mutated to all possible amino acids.
- Use that dictionary to generate a FASTA file in the format required by pVACbind (www.pvactools.org).
- Run pvacbind in parallel across different HLA allele(s) and peptide length(s).
- Note that you will likely have to run each combination in a separate command (we provide the scripts we used on our own cluster for your adaptation).
- Assemble prediction results and calculate the anchor scores for each position of each peptide (please refer to helper functions in Anchor Position Calculation.ipynb).
- This process can be done on a individual peptide-HLA combination basis but you can also aggregate and average across multiple peptides (for the same length for the same HLA allele )for an overall score.
The project is licensed under the MIT license.