-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to Nextclade v3 #1160
Migrate to Nextclade v3 #1160
Conversation
The config param `use_nextalign` has not been used since <2839278>
This commit removes the rules `mutation_summary` and `build_mutation_summary`, both of which called the removed script `scripts/mutation_summary.py`. I was looking into updating the script to work with Nextclade v3 outputs but it seems like the rules using the script are not actively used in the Snakemake workflow! In Feb 2021, the script and rule `mutation_summary` were added and the only use of the output `results/mutation_summary_{origin}.tsv.xz` was to be uploaded it as an intermediate file of the builds.¹ In Jan 2022, the upload of the intermediate file was removed² and I cannot find any other uses of the file so the rule `mutation_summary` seems unused. In March 2021, the rule `build_mutation_summary` was added to support the rule `add_mutation_counts`.³ However, `add_mutation_counts` was later updated to use `augur distance`⁴ which does not require the mutation summary output. I cannot find any other uses of the output file `results/{build_name}/mutation_summary.tsv`, so the rule `build_mutation_summary` seems unused. ¹ <#562> ² <#814> ³ <3b23e8f> ⁴ <d24d531>
--jobs {threads} \ | ||
--input-dataset {input.nextclade_dataset} \ | ||
--output-tsv {output.nextclade_qc} \ | ||
--output-fasta {output.alignment} \ | ||
--output-translations {params.output_translations} \ | ||
--output-insertions {output.insertions} 2>&1 | tee {log} | ||
--output-translations {params.output_translations} 2>&1 | tee {log} | ||
""" | ||
|
||
rule join_metadata_and_nextclade_qc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tempted to update this rule to use augur merge
to join the metadata and Nextclade data. However, the custom join-metadata-and-clades.py
script includes calculations for clock_deviation
that I don't really want to touch on here. Will leave that for another time...
Emma used mutation summary in covariants back in the day. I am not sure if it is still used. If file upload was removed, then probably not. Or she could be running it outside of the official AWS-based flow. Annotation is used in Nextclade during alignment (if provided) - this way it decides e.g. where to put a deletion in a triplet. Without annotation it might produce lower quality alignments and change what mutations are detected, so I think it's better to keep annotations, even if translation is not being run. If so decided, then might also add a comment or a description that even though annotation appears unused, it is in fact used for codon-aware nucleotide alignment (and perhaps a link to the docs): https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/01-sequence-alignment.html
|
Thanks for the reminder! I found an old Slack thread that confirms @emmahodcroft no longer used the mutation summary produced by ncov-ingest, but I could not find anything regarding the mutation summary produced here.
Thanks for the pointer! That's good to know. I'll add back annotations. |
The only downstream use of the insertions and translations outputs of `align` rule was the `mutation_summary` rule which was removed in the previous commit. Even though this is no longer doing translations, we are still keeping the `annotation` config param and the `defaults/annotation.gff` file for codon-aware nucleotide alignment. <#1160 (comment)>
Major change includes the removal of the insertions.csv output from the `build_align` rule. This output was only used by the `build_mutation_summary` rule, which was removed in a previous commit. Note: Using `nextclade` instead of `nextclade3` to allow the workflow to continue to work with Snakemake's `--use-conda` interface.
ee43a54
to
c5ca8fe
Compare
c5ca8fe
to
f269e69
Compare
Description of proposed changes
Main goal is to migrate to Nextclade v3. I discovered Nextclade related params and rules that were unused, so this PR also removes them to make the migration easier.
See commits for details.
Related issue(s)
Resolves #1157
Release checklist
If this pull request introduces backward incompatible changes, complete the following steps for a new release of the workflow:
docs/src/reference/change_log.md
in this pull request to document these changes and the new version number.