Studying the evolutionary trajectories of S2P6 for breadth expansion using a library-on-library screen that involves 27 unique betacoronavirus stem helix peptides and 1,024 S2P6 variants.
- ./Fasta/SH_pep_ref.fa: A list of stem helix peptides in this experiment
- ./data/file_names.tsv: Filenames for the merged read files
- Raw PacBio seqeucing data in fastq format from NIH SRA database BioProject PRJNA1113356
- Raw Illumina seqeucing data in fastq format from NIH SRA database BioProject PRJNA1064076
-
Identify the sequences of stem helix peptide, S2P6 variant, and barcode in each read
python3 script/PacBio_fastq2seq.py
- Input file:
- Fastq file from the PacBio sequencing
- Output file:
- data/barcode_SHpep_mutID.tsv
- Input file:
-
Filter barcodes with low read counts and perform error correction
- Input file:
- data/barcode_SHpep_mutID.tsv
- ./Fasta/SH_pep_ref.fa
- Ouput file:
- Input file:
-
Merging Illumina seqeuncing reads
python3 script/merge_reads.py
- Input file:
- All .fastq files in [fastq/]
- All .fastq files in [fastq/]
- Output files:
- merged files in [fastq/merged]
- merged files in [fastq/merged]
- Input file:
-
Counting unique barcode sequences
python3 script/fastq2count.py
- Input file:
- All merged fastq files in [fastq/merged]
- ./data/file_names.tsv
- All merged fastq files in [fastq/merged]
- Output files:
- Input file:
-
Splitting count file for faster processing
python3 script/split_count_df.py
- Input file:
- Output files:
- Input file:
-
Indentifying pairs of stem helix peptide and S2P6 mutant
python3 script/mut_ID.py
- Input file:
- Output files:
- Input file:
-
Calculating the frequency of each variant
python3 script/count2freq.py
- Input file:
- Output files:
- Input file:
-
Calculate the expression scores and binding scores
python3 script/freq2score.py
- Input file:
- Output files:
- Input file:
-
Plot correlation between expression scores and binding scores
python3 script/plot_replicate_qc.py
- Input file:
- Output files:
- Input file:
-
Plot correlation between binding scores and effect of different frequency cutoffs
Rscript script/plot_QC.R