Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Nextclade v3 #1160

Merged
merged 6 commits into from
Oct 30, 2024
Merged

Migrate to Nextclade v3 #1160

merged 6 commits into from
Oct 30, 2024

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Oct 23, 2024

Description of proposed changes

Main goal is to migrate to Nextclade v3. I discovered Nextclade related params and rules that were unused, so this PR also removes them to make the migration easier.

See commits for details.

Related issue(s)

Resolves #1157

Release checklist

If this pull request introduces backward incompatible changes, complete the following steps for a new release of the workflow:

  • Determine the version number for the new release by incrementing the most recent release (e.g., "v2" from "v1").
  • Update docs/src/reference/change_log.md in this pull request to document these changes and the new version number.
  • After merging, create a new GitHub release with the new version number as the tag and release title.

The config param `use_nextalign` has not been used since
<2839278>
This commit removes the rules `mutation_summary` and
`build_mutation_summary`, both of which called the removed script
`scripts/mutation_summary.py`.

I was looking into updating the script to work with Nextclade v3 outputs
but it seems like the rules using the script are not actively used in
the Snakemake workflow!

In Feb 2021, the script and rule `mutation_summary` were added and the
only use of the output `results/mutation_summary_{origin}.tsv.xz`
was to be uploaded it as an intermediate file of the builds.¹
In Jan 2022, the upload of the intermediate file was removed² and I
cannot find any other uses of the file so the rule `mutation_summary`
seems unused.

In March 2021, the rule `build_mutation_summary` was added to support
the rule `add_mutation_counts`.³ However, `add_mutation_counts` was
later updated to use `augur distance`⁴ which does not require the
mutation summary output. I cannot find any other uses of the output file
`results/{build_name}/mutation_summary.tsv`, so the rule
`build_mutation_summary` seems unused.

¹ <#562>
² <#814>
³ <3b23e8f>
⁴ <d24d531>
--jobs {threads} \
--input-dataset {input.nextclade_dataset} \
--output-tsv {output.nextclade_qc} \
--output-fasta {output.alignment} \
--output-translations {params.output_translations} \
--output-insertions {output.insertions} 2>&1 | tee {log}
--output-translations {params.output_translations} 2>&1 | tee {log}
"""

rule join_metadata_and_nextclade_qc:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to update this rule to use augur merge to join the metadata and Nextclade data. However, the custom join-metadata-and-clades.py script includes calculations for clock_deviation that I don't really want to touch on here. Will leave that for another time...

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Oct 23, 2024

Emma used mutation summary in covariants back in the day. I am not sure if it is still used. If file upload was removed, then probably not. Or she could be running it outside of the official AWS-based flow.

Annotation is used in Nextclade during alignment (if provided) - this way it decides e.g. where to put a deletion in a triplet. Without annotation it might produce lower quality alignments and change what mutations are detected, so I think it's better to keep annotations, even if translation is not being run. If so decided, then might also add a comment or a description that even though annotation appears unused, it is in fact used for codon-aware nucleotide alignment (and perhaps a link to the docs):

https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/01-sequence-alignment.html

Nextclade can use a genome annotation to make the alignment more interpretable.
Sometimes, the placement of a sequence deletion or insertion is ambiguous as in the following example. The gap could be moved forward or backward by one base with the same number of matches:

@joverlee521
Copy link
Contributor Author

Emma used mutation summary in covariants back in the day. I am not sure if it is still used. If file upload was removed, then probably not. Or she could be running it outside of the official AWS-based flow.

Thanks for the reminder! I found an old Slack thread that confirms @emmahodcroft no longer used the mutation summary produced by ncov-ingest, but I could not find anything regarding the mutation summary produced here.

Without annotation it might produce lower quality alignments and change what mutations are detected, so I think it's better to keep annotations, even if translation is not being run. If so decided, then might also add a comment or a description that even though annotation appears unused, it is in fact used for codon-aware nucleotide alignment (and perhaps a link to the docs):

https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/01-sequence-alignment.html

Thanks for the pointer! That's good to know. I'll add back annotations.

The only downstream use of the insertions and translations outputs of
`align` rule was the `mutation_summary` rule which was removed in the
previous commit.

Even though this is no longer doing translations, we are still keeping
the `annotation` config param and the `defaults/annotation.gff` file
for codon-aware nucleotide alignment.
<#1160 (comment)>
Major change includes the removal of the insertions.csv output from the
`build_align` rule. This output was only used by the
`build_mutation_summary` rule, which was removed in a previous commit.

Note: Using `nextclade` instead of `nextclade3` to allow the workflow to
continue to work with Snakemake's `--use-conda` interface.
docs/src/reference/workflow-config-file.rst Outdated Show resolved Hide resolved
@joverlee521 joverlee521 merged commit 80c06c1 into master Oct 30, 2024
6 checks passed
@joverlee521 joverlee521 deleted the nextclade-v3 branch October 30, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to Nextclade v3
3 participants