This workflow analyzes the historical value of escape scores per HA site and amino acid from Welsh et al. 2023's antigenic escape assays. We produced one phylogenetic tree and measurements panel per past vaccine composition meeting season starting with October 2020 and ending with February 2023. For each season, we calculated escape scores per H3N2 HA sequence based on the amino acid mutations present in each sequence and we calculated the weighted amino acid distance of each sequence to the future population in 12 months. To understand how predictive the escape scores would have been historically, we scaled each strain's escape scores by the number of HA1 substitutions per strain and calculated the correlation of the scaled escape scores with the distance to the future population. When escape scores were predictive of the future, we expected higher scores to correspond to lower distances to the future. For each tree, we assigned both the historical clade labels that were used at the time and modern "subclade" labels that reflect finer resolution of genetic diversity.
Install Snakemake (version 7.32.4 or later) and run the following command from the top-level of this repository.
snakemake
This workflow produces Nextstrain phylogenetic trees and measurements panels (see Lee et al. 2023) for historical H3N2 HA sequences and the corresponding escape scores per serum for those sequences. Each link below corresponds to a Nextstrain view for one season showing the scatterplot view of distance to the future by scaled escape score in the left panel and the measurements from escape scores per serum sample and historical clade label on the right panel.