Adding USDA genotype and lineage #80

jameshadfield · 2024-07-26T01:47:10Z

Here are some notes after trying out USDA's GenoFLU tool to assign lineages (per-segment) and genotypes (per-genome):

This tool uses BLAST to identify North American H5NX genomes in the 2.3.4.4b clade from a curated database. Pre-defined genotypes are cross-referenced with the top segment identifications, and a genotype is assigned.

The tool is slightly inconvenient to use in the context of our pipeline as it requires a single fasta file per strain (genome), so we'd have to either create these on the fly, run GenoFLU, extract the results, and delete the temporary files or modify the tool to be more ergnomic for our usage. I couldn't install it via conda but used the provided docker image.

Example usage

mkdir results/genoflu
cd results/genoflu

# create a FASTA file for a specific strain
echo 'A/muteswan/Austria/23169070001/2023' > id.txt
SEGMENTS=("pb2" "pb1" "pa" "ha" "np" "na" "mp" "ns")
echo > data.fasta
for s in ${SEGMENTS[@]}; do
    seqkit grep -nf id.txt ../../data/gisaid/sequences_${s}.fasta | seqkit replace -p '$' -r "/${s}" >> data.fasta
done;

# run GenoFLU
docker container run --rm -it --mount type=bind,src=.,target=/avian-flu \
    quay.io/biocontainers/genoflu:1.03--hdfd78af_0 \
    bash -c "cd avian-flu/results/genoflu && genoflu.py -f data.fasta"

Sample results:

A/muteswan/Austria/23169070001/2023
Genotype: Not assigned: Only 4 segments >98% match found of total 8 segments in input file
Lineages: PB1:ea3, HA:ea3, NP:ea6, MP:ea3

A/carrioncrow/Hokkaido/B081/2024/HA
Genotype: A3
Lineages: PB2:ea3, PB1:ea3, PA:ea3, HA:ea3, NP:ea3, NA:ea3, MP:ea3, NS:ea3

A/Dairycattle/Kansas/5/202 (NCBI)
Genotype: B3.13
Lineages: PB2:am2.2, PB1:am4, PA:ea1, HA:ea1, NP:am8, NA:ea1, MP:ea1, NS:am1.1

jameshadfield added the enhancement New feature or request label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding USDA genotype and lineage #80

Adding USDA genotype and lineage #80

jameshadfield commented Jul 26, 2024 •

edited

Loading

Adding USDA genotype and lineage #80

Adding USDA genotype and lineage #80

Comments

jameshadfield commented Jul 26, 2024 • edited Loading

Example usage

jameshadfield commented Jul 26, 2024 •

edited

Loading