This is a minor release aimed towards a nextclade
dataset upgrade from 2022-10-27
to 2023-01-09
which adds nomenclature for newly designated recombinants XBH
- XBP
. This release also adds initial support for the detection of "recursive recombination" including XBL
and XBN
which are recombinants of XBB
.
- Issue #24: Create documentation on Read The Docs
- Issue #210: Handle numeric strain names.
- Issue #185: Simplify creation of the pango-lineage nomenclature phylogeny to use the lineage_notes.txt file and the pango_aliasor library.
- Issue #195: Add bypass to intermission allele ratio for edge cases.
- Issue #204: Add special handling for XBB sequenced with ARTIC v4.1 and dropout regions.
- Issue #205: Add new column
parents_conflict
to indicate whether the reported lineages from covSPECTRUM conflict with the reported parental clades from `sc2rf. - Issue #213: Add
XBK
to auto-pass lineages. - Issue #222: Add new parameter
--gisaid-access-key
tosc2rf
andsc2rf_recombinants
. - Issue #229: Fix bug where auto-pass lineages are missing when exclude_negatives is set to true.
- Issue #231: Fix bug where 'null' lineages in covSPECTRUM caused error in
sc2rf
postprocess. - The order of the
postprocessing.py
was rearranged to have more comprehensive details for auto-pass lineages. - Add
XAN
to auto-pass lineages.
- Issue #209: Restrict the palette for
rbd_level
to the range of0:12
. - Issue #218: Fix bug concerning data fragmentation with large numbers of sequences.
- Issue #221: Remove parameter
--singletons
in favor of--min-cluster-size
to control cluster size in plots. - Issue #224: Fix bug where plot crashed with extremely large datasets.
- Combine
plot
andplot_historical
into one snakemake rule. Also at custom patternplot_NX
(ex.plot_N10
) to adjust min cluster size.
- Combine
report
andreport_historical
into one snakemake rule.
- Issue #225: Fix bug where false negatives passed validation because the status column wasn't checked.
- Issue #217:
XBB.1.5
- Issue #196:
XBF
- Issue #206:
XBG
- Issue #196:
XBH
- Issue #199:
XBJ
- Issue #213:
XBK
- Issue #219:
XBL
- Issue #215:
XBM
- Issue #197:
XBN
- Issue #203:
proposed1305
- Issue #208:
proposed1340
- Issue #212:
proposed1425
- Issue #214:
proposed1440
- Issue #216:
proposed1444
- Issue #220:
proposed1576
2964b4a1
docs: update notes to include 1576 proposed issuefdc874ab
docs: add test summary package for v0.7.03f3d4438
docs: update docs v0.7.078696b36
script: add bug fix to sc2rf postprocess for #231403777a0
script: lint plotting script2a09c783
script: fix sc2rf postprocess bug in duplicate removald44d5f90
data: add XBP to controls-gisaid4293439c
profile: add controls-gisaid to virusseq builds91d6fb89
defaults: update nextclade dataset to 2023-02-01630b2cd5
resources: update49e6f598
profile: add virusseq profile7e586d1d
script: add extra logic for auto-passing lineages0ebe5e9c
script: fix bug in report where it didn't check that plots existed25b2f243
docs: update developers guide914d933f
defaults: add XBN to controls-gisaid and validation8eaf08a9
data: restore controls-gisaid strain listfa123009
script: defragment plot for 2185f24f695
dataset: update controls-gisaid strain listefc5aab7
defaults: update validation to fix XBH dropout5a76f81c
sc2rf: update sc2rf to use gisaid access token84a01cfc
script: fix validate bug for #225cc031e37
env: version control all docs dependenciese1c18b91
env: add kaleido for static image exportf63a254a
docs: update configuration and faq41c533fd
docs: update links in developers guide71d48037
defaults: update validation values35a5d1ee
script: change slurm job name to basename of conda env04e425f8
docs: change links to html refsccda5121
docs: add pipeline definiton example18e7ae78
docs: remove README content and link to read the docs5f7ec1a1
docs: rename sphinx pagesd9cc87dd
workflow: create new pattern plot_NX to customize min cluster size1c3cc7a4
docs: success with read the docs, fix description include5921240b
env: add sphinx rtd themecd789917
docs: try read the docs with requirements6d63bcbf
docs: remove sphinx conda too slow47fe5a54
env: switch git from anaconda to conda-forge channel7e8c4c6c
docs: downgrade sphinx python to 3.828e7be0e
docs: attempt conda 3 with sphinx readthedocs985159b6
docs: attempt conda 2 with sphinx readthedocs7ab1c5d9
docs: attempt conda with sphinx readthedocs6d7079a8
docs: add sphinx for readthedocs #241809dad5
workflow: update validation for new controls-gisaid strains71c0fb34
script: catch tight layout warnings for #224690bbd7c
resources: update issues348264b4
resources: add proposed1444 and proposed1576 to breakpoints3f657060
resources: add XBL and XBM to breakpointsa8528cc0
docs: add development section to READMEb0446f66
docs: add development section to README03a65ebf
ci: extend miniforge-variant to all jobs for #223f186a1c6
ci: use miniforge-variant to fix #223cb1bda73
script: add intermission ratio bypass for #195c0429ec3
parameter: update nextclade data2377fce6
script: improve postprocess duplicate reconciliation for lineage matches6b891c92
resources: add proposed1440 to breakpoints #2147171dcfd
resources: add proposed1425 to breakpoints #212d5a2bf1c
resources: add proposed1393 to breakpoints #21118961b51
script: improve postprocess duplicate reconciliation with multiple strain matches1d7263f9
resources: add proposed1340 to breakpoints #2089b06cff6
resources: update validation for controls762fc77d
script: fix postprocess bug in auto-pass parsingea7cdcaf
param: add XBK to resources and vocccc08dba
resources: upgrade sc2rf, upgrade Nextclade #176, update XBG breakpoints #206, parent conflict #205, rbd palette #209c0c9209b
script: fix cluster str recode in plot_breakpoints044040c2
param: add XAN to auto-passd7e728d0
script: handle numeric strain names for #210c014cfb7
script: hardcode rbd palette range from 0 to 12 for #20991f4bb05
script: fix missing X parent in lineage_tree for #185f341dd59
ci: switch flake8 repo from gitlab to githubdea9eeb3
resources: update XBB validation lineagebe32e69e
workflow: fix typo in nextclade dataset_dir575255b8
workflow: move lineage tree to resources as not dependent on nextclade for #185447b0b8b
env: add pango-aliasor for #185823428a9
param: add XBB_ARTIC alt breakpoints for #204
This is a minor bugfix release aimed towards resolving network connectivity errors and catching false positives.
- Issue #195: Consider alleles outside of parental regions as intermissions (conflicts) to catch false positives.
- Issue #201: Make LAPIS query of covSPECTRUM optional, to help with users with network connectivity issues. This can be set with the flag
lapis: false
in builds under the rulesc2rf_recombinants
. - Issue #202: Document connection errors related to LAPIS and provide options for solutions.
83ee0139
docs: update changelog for v0.6.100fe2fc8
docs: update notes for v0.6.1fa03ea96
workflow: fix bug where rbd_levels log was incorrectly nameda281b75c
workflow: make lapis optional param for #201 #20275684b55
docs: update docs1085ce0e
script: postprocess count alleles outside regions as intermissions for #195c11770c1
param: add XAV to auto-pass for #104 #195
This is a major release that includes the following changes:
-
Detection of all recombinants in Nextclade dataset 2022-10-27:
XA
toXBE
. -
Implementation of recombinant sublineages (ex.
XBB.1
). -
Implementation of immune-related statistics (
rbd_level
,immune_escape
,ace2_binding
) fromnextclade
, theNextstrain
team, and Jesse Bloom's group:- https://github.com/nextstrain/ncov/blob/master/defaults/rbd_levels.yaml
- https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS_Omicron/epistatic-shifts/
- https://jbloomlab.github.io/SARS2_RBD_Ab_escape_maps/escape-calc/
- https://doi.org/10.1093/ve/veac021
- https://doi.org/10.1101/2022.09.15.507787
- https://doi.org/10.1101/2022.09.20.508745
- Issue #168: NULL collection dates and NULL country is implemented.
controls
was updated to in include 1 strain fromXBB
for a total of 22 positive controls. The 28 negative controls were unchanged fromv0.5.1
.controls-gisaid
strain list was updated to includeXA
through toXBE
for a total of 528 positive controls. This includes sublineages such asXBB.1
andXBB.1.2
which synchronizes with Nextclade Dataset 2022-10-19. The 187 negatives controls were unchanged fromv0.5.1
.
- Issue #176: Upgrade Nextclade dataset to tag
2022-10-27
and upgrade Nextclade tov2.8.0
. - Issue #193: Use the nextclade dataset
sars-cov-2-21L
to calculateimmune_escape
andace2_binding
.
- Issue #193: Create new rule
rbd_levels
to calculate the number of key receptor binding domain (RBD) mutations.
- Issue #185: Use nextclade dataset Auspice tree for lineage hierarchy. Previously, the phylogeny of lineages was constructed from the cov-lineages website YAML. Instead, we now use the tree provided with nextclade datasets, to better synchronize the lineage model with the output.
Rather than creating the output tree in resources/lineages.nwk
, the lineage tree will output to data/sars-cov-2_<DATE>/tree.nwk
. This is because different builts might use different nextclade datasets, and so are dataset specific output.
- Issue #179: Fix bug where
sc2rf/recombinants.ansi.txt
is truncated. - Issue #180: Fix recombinant sublineages (ex. XAY.1) missing their derived mutations in the
cov-spectrum_query
. Previously, thecov-spectrum_query
mutations were only based on the parental alleles (before recombination). This led to sublinaeges (ex.XAY.1
,XAY.2
) all having the exact same query. Now, thecov-spectrum_query
will include all substitutions shared between all sequences in thecluster_id
. - Issue #187: Document bug that occurs if duplicate sequences are present, and the initial validation was skipped by not running
scripts/create_profile.sh
. - Issue #191 and Issue #192: Reduce false positives by ensuring that each mode of sc2rf has at least one additional parental population that serves as the alternative hypothesis.
- Issue #195: Implement a filter on the ratio of intermissions to alleles. Sequences will be marked as false positives if the number of intermissions (i.e. alleles that conflict with the identified parental region) is greater than or equal to the number of alleles contributed by the minor parent. This ratio indicates that there is more evidence that conflicts with recombination than there is allele evidence that supports a recombinant origin.
- Issue #183: Recombinant sublineages. When nextclade calls a lineage (ex.
XAY.1
) which is a sublineage of a sc2rf lineage (XAY
), we prioritize the nextclade assignment. - Issue #193: Add immune-related statistics:
rbd_levels
,rbd_substitutions
,immune_escape
, andace2_binding
.
- Issue #57: Include substitutions within breakpoint intervals for breakpoint plots. This is a product of Issue #180 which provides access to all substitutions.
- Issue #112: Fix bug where breakpoints plot image was out of bounds.
- Issue #188: Remove the breakpoints distribution axis (ex.
breakpoints_clade.png
) in favor of putting the legend at the top. This significant reduces plotting issues (ex. Issue #112). - Issue #193: Create new plot
rbd_level
.
- Issue #85:
XAY
, updated controls - Issue #178:
XAY.1
- Issue #172:
XBB.1
- Issue #175:
XBB.1.1
- Issue #184:
XBB.1.2
- Issue #173:
XBB.2
- Issue #174:
XBB.3
- Issue #181:
XBC.1
- Issue #182:
XBC.2
- Issue #171:
XBD
- Issue #177:
XBE
- Issue #198:
proposed1229
- Issue #199:
proposed1268
- Issue #197:
proposed1296
2506e907
docs: update changelog and add v0.6.0 testing summary package0cc421e0
docs: update all contributorscd9b6cbb
resources: update issues0fa2e3c1
docs: update readme375c3a76
resources: add proposed lineages for #197 #198 #199dad989e7
param: remove BQ.1 from sc2rf mode VOC as its too close to BA.5.3d7cb005f
docs: update issue template lineage-validation1beac97e
resources: add XBF to curated breakpoints for #196fae7bfdb
script: sc2rf implement intermission allele ratio for #19589a41265
script: additional manual curation of lineage_treeebd3ce1f
resources: update validation strains for controls-gisaidd8bff572
script: add RBD Level slide to reportc1879c1d
script: catch errors in rbd_level plotting with no recombinants63545a08
script: fix bug in linelist with cluster_privatesc24a7179
resources: update issuesd32d557f
docs: update development notes7f825a41
script: manual fix for CK in lineage_treefdd6f66d
workflow: implement rbd levels for #1930058dd6e
param: upgrade nextclade dataset to 2022-10-27 and reduce breakpoints of XA modefb062c32
env: upgrade nextclade to v2.8.0800c1e9c
param: experiment with XAJ mode for 1916958337f
env: version control pip to v22.3 on conda-forge2bf5141b
env: change anaconda channel to conda-forge for #1709b484d3b
script: allow NULL dates in metadata for #168 remove breakpoint dist axis for #188 fix breakpoints plot out of bounds for #11290b153e7
docs: update dev notes5057814a
resources: remove deprecated lineages and geo_resolutions28c1b8bd
resources: update issues and breakpoints379112b4
profile: update validation valuesf8f4273c
profile: update controls-gisaid to XBEe462cead
profile: add XBB.1 to controlsc2aca577
profile: increase max jobs in controls-gisaid-hpceabbed42
param: add more BA.5 sublineages to sc2rf lineages75e674e0
workflow: implement sublineages for #1835caaaa4f
workflow: use nextclade dataset for phylogeny for #185732f5eea
resources: add XBC.1 to breakpoints for #1812c1f09a5
resources: add new lineage XAY.1 to curated breakpoints for #178b44e68e1
script: use all substitutions for cov-spectrum for #180 and #579d222e1f
script: ignore issue 848 when downloading, is manually curated98c8a912
script: bugfix where sc2rf ansi was truncated for #1791e29d7ea
env: upgrade nextclade and nextclade dataset for #176409183f5
resources: update breakpoints for proposed1139 #165
- Issue #169: AttributeError: 'str' object has no attribute 'name'
- Issue #167: Alias key out of date, change source
- Issue #166:
proposed1138
- Issue #165:
proposed1139
799904eb
docs: update CHANGELOG for v0.5.11f9cd623
docs: update docs for v0.5.143fc4d71
env: update tabulate channel for #1693b9c3796
env: version control tabulate for #1695f57bca2
script: update alias url for #16731647371
docs: update readme hpc section
Please check out the
v0.5.0
Testing Summary Package for a comprehensive report.
This is a minor release that includes the following changes:
- Detection of all recombinants in Nextclade dataset 2022-09-27:
XA
toXBC
. - Create any number of custom
sc2rf
modes with CLI arguments.
- Issue #96: Create newick phylogeny of pango lineage parent child relationships, to get accurate sublineages including aliases.
- Issue #118: Fix missing pango-designation issues for XAY and XBA.
- Issue #25: Reduce positive controls to one sequence per clade. Add new positive controls
XAL
,XAP
,XAS
,XAU
, andXAZ
. - Issue #92: Reduce negative controls to one sequence per clade. Add negative control for
22D (Omicron) / BA.2.75
. - Issue #155: Add new profile and dataset
controls-gisaid
. Only a list of strains is provided, as GISAID policy prohibits public sharing of sequences and metadata.
- Issue #77: Report slurm command for
--hpc
profiles inscripts/create_profiles.sh
. - Issue #153: Fix bug where build parameters
metadata
andsequences
were not implemented.
-
Issue #78: Add new parameter
max_breakpoint_len
tosc2rf_recombinants
to mark samples with two much uncertainty in the breakpoint interval as false positives. -
Issue #79: Add new parameter
min_consec_allele
tosc2rf_recombinants
to ignore recombinant regions with less than this number of consecutive alleles (both diagnostic SNPs and diganostic reference alleles). -
Issue #80: Migrate sc2rf froma submodule to a subdirectory (including LICENSE!). This is to simplify the updating process and avoid errors where submodules became out of sync with the main pipeline.
-
Issue #83: Improve error handling in
sc2rf_recombinants
when the input stats files are empty. -
Issue #89: Reduce the default value of the parameter
min_len
insc2rf_recombinants
from1000
to500
.This is to handleXAP
andXAJ
. -
Issue #90: Auto-pass select nextclade lineages through
sc2rf
:XN
,XP
,XAR
,XAS
, andXAZ
. This requires differentiating the nextclade inputs as separate parameters--nextclade
and--nextclade-no-recom
.XN
,XP
, andXAR
have extremely small recombinant regions at the terminal ends of the genome. Depending on sequencing coverage,sc2rf
may not reliably detect these lineages.The newly designated
XAS
andXAZ
pose a challenge for recombinant detection using diagnostic alleles. The first region ofXAS
could be eitherBA.5
orBA.4
based on subsitutions, but is mostly likelyBA.5
based on deletions. Since the region contains no diagnostic alleles to discriminateBA.5
vs.BA.4
, breakpoints cannot be detected bysc2rf
.Similarly for
XAZ
, theBA.2
segments do not contain anyBA.2
diagnostic alleles, but instead are all reversion fromBA.5
alleles. TheBA.2
parent was discovered by deep, manual investigation in the corresponding pango-designation issue. Since theBA.2
regions contain no diagnostic forBA.2
, breakpoints cannot be detected bysc2rf
. -
Issue #95: Generalize
sc2rf_recombinants
to take any number of ansi and csv input files. This allows greater flexibility in command-line arguments tosc2rf
and are not locked into the hardcodedprimary
andsecondary
parameter sets. -
Issue #96: Include sub-lineage proportions in the
parents_lineage_confidence
. This reduces underestimating the confidence of a parental lineage. -
Issue #150: Fix bug where
sc2rf
would write empty output csvfiles if no recombinants were found. -
Issue #151: Fix bug where samples that failed to align were missing from the linelists.
-
Issue #158: Reduce
sc2rf
param--max-intermission-length
from3
to2
to be consistent with Issue #79. -
Issue #161: Implement selection method to pick best results from various
sc2rf
modes. -
Issue #162: Upgrade
sc2rf/virus_properties.json
. -
Issue #163: Use LAPIS
nextcladePangoLineage
instead ofpangoLineage
. Also disable default filtermax_breakpoint_len
forXAN
. -
Issue #164: Fix bug where false positives would appear in the filter
sc2rf
ansi output (recombinants.ansi.txt
). -
The optional
lapis
parameter forsc2rf_recombinants
has been removed. Querying LAPIS for parental lineages is no longer experimental and is now an essential component (cannot be disabled). -
The mandatory
mutation_threshold
parameter forsc2rf
has been removed. Instead,--mutation-threshold
can be set independently in each of thescrf
modes.
- Issue #157: Create new parameters
min_lineage_size
andmin_private_muts
to control lineage splitting intoX*-like
.
- Issue #17: Create script to plot lineage assignment changes between versions using a Sankey diagram.
- Issue #82: Change epiweek start from Monday to Sunday.
- Issue #111: Fix breakpoint distribution axis that was empty for clade.
- Issue #152: Fix file saving bug when largest lineage has
/
characters.
- Issue #88: Add pipeline and nextclade versions to powerpoint slides as footer. This required adding
--summary
as param toreport
.
- Issue #56: Change rule
validate
from simply counting the number of positives to validating the fieldslineage
,breakpoints
,parents_clade
. This involves adding a new default parameterexpected
for rulevalidate
indefaults/parameters.yaml
.
- Issue #149:
XA
- Issue #148:
XB
- Issue #147:
XC
- Issue #146:
XD
- Issue #145:
XE
- Issue #144:
XF
- Issue #143:
XG
- Issue #141:
XH
- Issue #142:
XJ
- Issue #140:
XK
- Issue #139:
XL
- Issue #138:
XM
- Issue #137:
XN
- Issue #136:
XP
- Issue #135:
XQ
- Issue #134:
XR
- Issue #133:
XS
- Issue #132:
XT
- Issue #131:
XU
- Issue #130:
XV
- Issue #129:
XW
- Issue #128:
XY
- Issue #127:
XZ
- Issue #126:
XAA
- Issue #125:
XAB
- Issue #124:
XAC
- Issue #123:
XAD
- Issue #122:
XAE
- Issue #120:
XAF
- Issue #121:
XAG
- Issue #119:
XAH
- Issue #117:
XAJ
- Issue #116:
XAK
- Issue #115:
XAL
- Issue #110:
XAM
- Issue #109:
XAN
- Issue #108:
XAP
- Issue #107:
XAQ
- Issue #87:
XAS
- Issue #105:
XAT
- Issue #103:
XAU
- Issue #104:
XAV
- Issue #105:
XAW
- Issue #85:
XAY
- Issue #87:
XAZ
- Issue #94:
XBA
- Issue #114:
XBB
- Issue #160:
XBC
- Issue #99:
proposed808
pull/113
docs: add issues template for lineage validation
b48ad6d7
docs: fix CHANGELOG pr04b17918
docs: update readme and changelog72dd5a8f
docs: add testing summary package for v0.4.2 to v0.5.0558f7d79
resources: fix breakpoints for XAE #12291e5843b
script: bugfix sc2rf ansi output for #1649bc13639
docs: update issues and validation table orderb63520e5
script: implement lineage check in dups for #117 #161901898da
sc2rf updates for #158 #161 #162 #16396fa6af1
dataset: update controls-gisaid strain list and validation84466a10
workflow: new param dup_method for #1619ca0c71e
script: implement duplicate reconciliation for #161112ea684
param: upgrade nextclade dataset for #159859b92c8
script: add more detail to validate table for failing samples5e285912
script: add param --min-link-size to compare_positivesbd01a5e4
workflow: added failed validate output to rule log8e5b90fb
workflow: don't use metadata for sc2rf_recombinants when exclude_negatives is truecdf45407
param: add new params min-lineage-size and min-private-muts for #157bc04fddf
workflow: update validation strains for #1556aa95221
param: fix typo of missing --mutation-threshold25df848c
param: remove param mutation_threshold as universal param for sc2rf46d2ee95
dataset: remove false positive LC0797902 from negative controlsb106b9d1
profile: change default hpc jobs from 2 to 10d6d02721
workflow: update validation tabled308d54b
script: fix node ordering in compare_positivesea65c6ff
ci: remove GISAID workflow for #154122e579c
ci: test storing csv files as secrets for #154d281add4
ci: experiment with secrets with test data for #1541d5406f5
script: generalize compare_positives to use other lineage columnsffd4f159
scripts: fix bug where metadata and sequences param were not implemented for #1537974860a
resources: special handling of proposed808 issues and breakpoints for #99d0e3f41a
script: fix file saving bug in report for #152f7d3157b
script: fix file saving bug in plot for #152f9931bb3
script: fix missing samples in sc2rf output for #15118b94df5
script: force sc2rf to always output csvfile headers for #150e1f14dfe
resources: update breakpoints for proposed808 #9935aaa922
resources: update breakpoints for XA - XAZ05cba895
resources: update breakpoints for XV #1301b0a02bf
resources: add gauntlet samples (all XA*) to validation29c7798d
param: add XAR to sc2rf auto-pass for 1060e6b413e
docs: change next ver from v0.4.3 to v0.5.0f8197e80
workflow: fix bug in rule validate where path was hard-codedabb4dec6
resources: update breakpoints for XAA #126d012b936
resources: update breakpoints for XAG and XAH for #120 and #121d389e048
param: add new XAJ mode for sc2rf for #117be853ac1
scripts: update rule validate for #56a02b4deb
docs: add issues template for lineage validation #1131b3cb780
script: fix bug of missing issues for #11882d9ce32
docs: update validation release notes1b705a6a
resources: update XAU breakpoints for #10318021fe3
docs: add XAQ issue #107 to release notes7ca8f06f
docs: add XAQ issue #107 to release notese9d8f905
docs: add issue #111 to release notesa638835a
script: fix bug in plot_breakpoints when axis empty for #111ba7ec30c
resources: update breakpoints for XAP #108d3952e44
docs: fix typo in relesae notesd56956d6
docs: add issues #86 and #87 to release notes00a706a8
script: remove redundant --clades arg in sc2rf bash script378eea62
param: add new sc2rf modes XB and proposed808 for #98 and #99bb802647
docs: add issue #17 to release notesb6688c86
env: add plotly to conda env and control all versionsadc6777e
script: improve directory creation in compare positives15a1fc04
script: add breakpoint axis label for #97f256a75f
docs: add notes for v0.4.3c3785fa0
env: upgrade csvtk to v0.24.07936e0ef
param: fix typo in mode omicron_omicrond014d0a1
param: revert XAS mode to default for #86ea83d78e
script: fix bug in postprocess where max_breakpoint_len was not checkedab030e21
param: add new XAS mode to default sc2rf runs for #866ccd0aa8
workflow: first draft of pango lineage tree for #969d2382cb
workflow: add param fix for postprocess inputs93007726
script: fix cli --clades arg parsing for scr2rf.shbed29fdb
script: add new csv col alleles to sc2rfbd681d5e
workflow: generalize sc2rf_recombinants inputs for #950315ffd6
docs: update development docs70111447
resources: update breakpoints and issues076e14bb
dataset: reduce controls to one sequence per clade for #25,9277d2210d
workflow: update rules for #46, #88, #89, #90d84205a0
script: add new param auto_pass for #9099b895a4
params: update params for #46, #89, #903e3d1022
script: add pipeline version to report for #882e6ac558
script: remove sc2rf_ver col from summary for #8018c35940
env: upgrade nextclade to v2.5.0 for #91a67e1159
workflow: autopass XAS through sc2rf for #862e16dac5
resources: update breakpoints and mutationsa7026450
workflow: upgrade nextclade dataset to 2022-08-23 for #8153fb4a8a
workflow: re-add sc2rf as subdirectory for #804b8b3fab
workflow: remove sc2rf submodule againb650e562
workflow: add sc2rf as subdirectory for #80073f3b94
workflow: remove sc2rf as submodule5ab5c1b5
resources: updated curated breakpoints5635b872
resources: update issues78e1f064
script: change epiweek to start on Sundary (cdc) for #8237f40480
script: add tables to compare positives between versions for #178eef7548
script: create new script to compare positives between versionsebf1e222
script: compare linelists from different versions for #178401c353
workflow: add new param max_breakpoint_len for #788bbcc041
script: report slurm command for --hpc profiles for #77c40a6791
workflow: restrict config rules to one threadc2b1ea57
script: revert unpublished lineages for #76dbe359c8
resources: add 882 to breakpoints
This is a minor bug fix and enhancement release with the following changes:
- Issue #70: Fix missing
sc2rf
version fromrecombinant_classifier_dataset
- Issue #74: Correctly identify
XN-like
andXP-like
. Previously, these were just assignedXN
/XP
regardless of whether the estimated breakpoints conflicted with the curated ones. - Issue #76: Mark undesignated lineages with no matching sc2rf lineage as
unpublished
.
- Issue #71: Only truncate
cluster_id
while plotting, not in table generation. - Issue #72: For all plots, truncate the legend labels to a set number of characters. The exception to this are parent labels (clade,lineage) because the full label is informative.
- Issue #73, #75: For all plots except breakpoints, lineages will be defined by the column
recombinant_lineage_curated
. Previously it was defined by the combination ofrecombinant_lineage_curated
andcluster_id
, which made cluttered plots that were too difficult to interpret. - New parameter
--lineage-col
was added toscripts/plot_breakpoints.py
to have more control on whether we want to plot the raw lineage (lineage
) or the curated lineage (recombinant_lineage_curated
).
8953ef03
docs: add CHANGELOG for v0.4.27ec5ccc6
docs: add notes for v0.4.21b3b1f1d
script: restore column name to recombinant_classifer_dataset901caf98
script: restore recombinant_lineage_curated of -like lineagesd6be9611
script: change internal delim of classifier for #70cdb4a78a
script: fix recombinant_classifier missing sc2rf for #70bf7a4e57
script: mark undesignated lineages with no matching sc2rf lineage as unpublished for #7646f6d754
workflow: update linelists and plotting for #74 and #75c03dd3be
script: don't split largest by cluster id for #73e9802e79
script: majority of plots will not split by cluster_id for #73bafb38fb
script: fix cluster ID truncation for issue #71ab712593
resources: curate and test breakpoints for proposed895
This is a minor bug fix release with the following changes:
- Issue #63: Remove
usher
andprotobuf
from the conda environment. - Issue #68: Remove ncov as a submodule.
- Issue #69: Remove 22C and 22D from
sc2rf/mapping.csv
andsc2rf/virus_properties.json
, as these interfere with breakpoint detection for XAN.
88650696
docs: add CHANGELOG for v0.4.100a2eec3
docs: add notes for v0.4.1d74a81d3
sc2rf: revert 22C and 22D clade addition7b662940
env: remove usher for issue #63adf92399
submodule: remove ncov for issue #680790aa04
docs: update CHANGELOG for v0.4.0
v0.4.0 has been trained and validated on the latest generation of SARS-CoV-2 Omicron clades (ex. 22A/BA.4 and 22B/BA.5). Recombinant sequences involving BA.4 and BA.5 can now be detected, unlike in v0.3.0 where they were not included in the sc2rf
models.
v0.4.0 is also a major update to how sequences are categorized into lineages/clusters. A recombinant lineage is now defined as a group of sequences with a unique combination of:
- Lineage assignment (ex.
XM
) - Parental clades (ex.
Omicron/21K,Omicron/21L
) - Breakpoints (ex.
17411:21617
) - NEW: Parental lineages (ex.
BA.1.1,BA.2.12.1
)
Novel recombinants (i.e. undesignated) can be identified by a lineage assignment that does not start with X*
(ex. BA.1.1
) or with a lineage assignment that contains -like
(ex. XM-like
). A cluster of sequences may be flagged as -like
if one of the following criteria apply:
-
The lineage assignment by Nextclade conflicts with the published breakpoints for a designated lineage (
resources/breakpoints.tsv
).- Ex. An
XE
assigned sample has breakpoint11538:12879
, which conflicts with the publishedXE
breakpoint (ex. 8394:12879
). This will be renamedXE-like
.
- Ex. An
-
The cluster has 10 or more sequences, which share at least 3 private mutations in common.
- Ex. A large cluster of sequences (N=50) are assigned
XM
. However, these 50 samples share 5 private mutationsT2470C,C4586T,C9857T,C12085T,C26577G
which do not appear in trueXM
sequences. These will be renamedXM-like
. Upon further review of the reported matching pango-designation issues (460,757,781,472,798
), we find this cluster to be a match toproposed798
.
- Ex. A large cluster of sequences (N=50) are assigned
The ability to identify parental lineages and private mutations is largely due to improvements in the newly released nextclade datasets, , which have increased recombinant lineage accuracy. As novel recombinants can now be identified without the use of the custom UShER annotations (ex. proposed771), all UShER rules and output have been removed. This significantly improves runtime, and reduces the need to drop non-recombinant samples for performance. The result is more comparable output between different dataset sizes (4 samples vs. 400,000 samples).
Note! Default parameters have been updated! Please regenerate your profiles/builds with:
scripts/create_profile.sh --data data/custom
- Issue #49: The tutorial lineages were changed from
XM
,proposed467
,miscBA1BA2Post17k
, toXD
,XH
,XAN
. The previous tutorial sequences had genome quality issues. - Issue #51: Add
XAN
to the controls dataset. This is BA.2/BA.5 recombinant. - Issue #62: Add
XAK
to the controls dataset. This is BA.2/BA.1 VUM recombinant monitored by the ECDC.
- Issue #46:
nextclade
is now run twice. Once with the regularsars-cov-2
dataset and once with thesars-cov-2-no-recomb
dataset. Thesars-cov-2-no-recomb
dataset is used to get the nucleotide substitutions before recombination occurred. These are identified by taking thesubstitutions
column, and excluding the substitutions found inprivateNucMutations.unlabeledSubstitutions
. The pre-recombination substitutions allow us to identify the parental lineages by querying cov-spectrum. - Issue #48: Make the
exclude_clades
completely optional. Otherwise an error would be raised if the user didn't specify any. - Issue #50: Upgrade from
v1.11.0
tov2.3.0
. Also upgrade the default dataset tags to 2022-07-26T12:00:00Z which had significant bug fixes. - Issue #51: Relax the recombinant criteria, by flagging sequences with ANY labelled private mutations as a potential recombinant for further downstream analysis. This was specifically for BA.5 recombinants (ex.
XAN
) as no other columns from thenextclade
output indicated this could be a recombinant. - Restrict
nextclade
output tofasta,tsv
(alignment and QC table). This saves on file storage, as the other default output is not used.
- Issue #51:
sc2rf
is now run twice. First, to detect recombination between clades (ex.Delta/21J
&Omicron/21K
). Second, to detect recombination within Omicron (ex.Omicron/BA.2/21L
&Omicron/BA.5/22B
). It was not possible to define universal parameters forsc2rf
that worked for both distantly related clades, and the closely related Omicron lineages. - Issue #51: Rename parameter
clades
toprimary_clades
and add new parametersecondary_clades
for detecting BA.5. - Issue #53: Identify the parental lineages by splitting up the observed mutations (from
nextclade
) into regions by breakpoint. Then query the list of mutations in https://cov-spectrum.org and report the lineage with the highest prevalence. - Tested out
--enable-deletions
again, which caused issues forXD
. This confirms that using deletions is still ineffective for defining breakpoints. - Add
B.1.631
andB.1.634
tosc2rf/mapping.tsv
and as potential clades in the default parameters. These are parents forXB
. - Add
B.1.438.1
tosc2rf/mapping.tsv
and as a otential clade in the default parameters. This is a parent forproposed808
. - Require a recombinant region to have at least one substitution unique to the parent (i.e. diagnostic). This reduces false positives.
- Remove the debugging mode, as it produced overly verbose output. It is more efficient to rerun manually with custom parameters tailored to the kind of debugging required.
- Change parent clade nomenclature from
Omicron/21K
to the more comprehensiveOmicron/BA.1/21K
. This makes it clear which lineage is involved, since it's not always obvious how Nextclade clades map to pango lineages.
- Issue #63: All UShER rules and output have been removed. First, because the latest releases of nextclade datasets (tag
2022-07-26T12:00:00Z
) have dramatically improved lineage assignment accuracy for recombinants. Second, was to improve runtime and simplicity of the workflow, as UShER adds significantly to runtime.
- Issue #30: Fixed the bug where distinct recombinant lineages would occasionally be grouped into one
cluster_id
. This is due to the new definition for recombinant lineages (see General) section, which now includes parental lineages and have sufficient resolving power. - Issue #46: Added new column
parents_subs
, which are the substitutions found in the parental lineages before recombination occurred using thesars-cov-2-no-recomb
nextclade dataset. Also added new columns:parents_lineage
,parents_lineage_confidence
, based on queryingcov-spectrum
for the substitutions found inparents_subs
. - Issue #53: Added new column
cov-spectrum_query
which includes the substitutions that are shared by ALL sequences of the recombinant lineage. - Added new column
cluster_privates
which includes the private substitutions shared by ALL sequences of the recombinant lineage. - Renamed
parents
column toparents_clade
, to differentiate it from the new columnparents_lineage
.
- Issue #4, Issue #57: Plot distributions of each parent separately, rather than stacking on one axis. Also plot the substitutions as ticks on the breakpoints figure.
v0.3.0 | v0.4.0 |
---|---|
![]() |
![]() |
- Issue #46: Plot breakpoints separately by clade and lineage. In addition, distinct clusters within the same recombinant lineage are noted by including their cluster ID as a suffix. As an example, please see
XM (USA) and X (England)
below. Where the lineage is the same (XM
), but the breakpoints differ, as do the parental lineages (BA.2
vsBA.2.12.1
). These clusters are distinct becauseXM (England)
lacks substitutions occurring around position 20000.
Clade | Lineage |
---|---|
![]() |
![]() |
- Issue #58: Fix breakpoint plotting from all lineages to just those observed in the reporting period. Except for the breakpoint plots in
plots_historical
. - Issue #59: Improved error handling of breakpoint plotting when a breakpoint could not be identified by
sc2rf
. This is possible ifnextclade
was the only program to detect recombination (and thus, we have no breakpoint data fromsc2rf
). - Issue #64: Improved error handling for when the lag period (ex. 4 weeks) falls outside the range of collection dates (ex. 2 weeks).
- Issue #65: Improved error handling of distribution plotting when only one sequence is present.
- Issue #67: Plot legends are placed above the figure and are dynamically sized.
v0.3.0 | v0.4.0 |
---|---|
![]() |
![]() |
- Issue #60: Remove changelog from final slide, as this content did not display correctly
- Issue #61: Fixed bug in the
report.xlsx
where the number of proposed and unpublished recombinant lineages/sequences was incorrect.
- Issue #58: New rule (
validate
) to validate the number of positives in controlled datasets (ex. controls, tutorials) againstdefaults/validation.tsv
. If validation fails based on an incorrect number of positives, the pipeline will exit with an error. This is to make it more obvious when results have changed during Continuous Integration (CI)
c027027b
docs: remove dev notes77d9f01b
ci: remove unit test workflow574f8c15
docs: update instructions in README79b61fe2
ci: remove unit tests3d53ebd2
docs: update notes for v0.4.0b968dc6d
script: add more subs columns to lineages linelist75477ec4
script: adjust lag epiweek by 1 when overlaps20409f5c
sc2rf: mapping change 22C to BA.2.12.1 and add 22D432b6b79
docs: add breakpoints lineage output for v0.4.04fbde4b9
docs: add breakts output image to compare v0.3.0 and v0.4.0af2f25d3
docs: add lineage output image for v0.4.0 to compare5615f113
docs: add lineage output image for v0.3.0 to compare57a08096
profile: adjust plotting min and max dates for tutorial4226c85b
script: mark nextclade recombinants unverified by sc2rf as false positivesd6700e7a
docs: rearrange sections in README7f773ba7
docs: update report and slides images and links3599a7f4
script: don't use proposed* lineages from sc2rf as consensus lineage76106b50
docs: add new image for lineages plot86c2fa2e
docs: update README and images488ea6c7
script: dynamic legends for #67bec01845
script: move plot legend to top for #6742d1281b
script: make sure X*-like lineages have proposed status230f9495
resources: update issues3c2f4c84
script: detect novel recombinants based on private mutationsd9e279be
resources: update breakpoints for XAF and XAG2d070cb2
script: improve linelist efficiency for assigning cluster idsb65437f2
script: add create_logger function to general functions import489154f6
sc2rf: explicitly call variables _primary and _secondary8f464a14
script: simplify extra columns for summary script30f90443
workflow: formally run sc2rf in primary/secondary mode in parallel6688c828
defaults: restore validation counts for XN and XP043b691e
script: add special processing for XN and XP939cc967
script: only adjust lineage status in linelist if positive28d91363
dataset: update tutorial strains for #4909e53a94
workflow: remove usher for #6322c63de0
workflow: add XAK to controls for Issue #62342fa2a3
script: create separate slides for designated, proposed, and unpublished for issue #6184c9b57d
script: create separate plots for designated, proposed, and unpublished for issue #6181c58931
script: if lineage is proposed* use as curated lineage rather than cluster_id2db6100b
ci: fix typo in profile creation job names18102431
script: remove changelog from report slides for issue #60b7784b30
docs: use new breakpoints path for README66ee7d1d
nextclade: upgrade datasets to tag 2022-07-26 for issue #507880327c
ci: don't trigger pipeline on images changese8c71171
docs: update breakpoints imagese13138e8
script: rename NA to Unknown parent when plotting breakpointsfc1b6129
script: remove unneeded constants in plot77d3614a
dataset: change controls proposed771 to XANf2d72330
script: catch empty plot when using tight_layout for breakpointsb55dbe95
workflow: include unpublished in positive status for rule validate557295b5
script: plotting catch when breakpoints are NA2306305d
profile: set exclusions for tutorial to defaultc049ccba
resource: update curated breakpoints218bab52
env: upgrade nextclade to v2.3.0 for issue #50a9ea0bd4
workflow: fix typo in rule validate that hard-coded controlscf613c34
workflow: control breakpoint plotting by clusters file0d4b50a4
resource: update breakpoints figures6520457a
script: plot subs along with breakpoints for issue #578110faa7
script: create a plot for cluster_id mostly for breakpoint plotting6f02f09c
script: for plot import function categorical_palette1f6195df
profile: by default do not retry jobs2842a4f9
workflow: add rule validate for issue #56693b07df
script: empty df catching in plot breakpoints12567fde
workflow: classify any sequence with unlabeled private mutations as a potential positive534ac899
docs: add more comments to summary script82b6a696
script: fix bug in parent palette for plot breakpoints42510719
config: remove explicit conda activation in slurm script and profilesab44e5d7
docs: update readme contributorse8e2b134
dataset: upgrade nextclade dataset086768c7
parameters: restrict breakpoints and parents to 10 for sc2rf83af3a73
sc2rf: output NA for false positives breakpoints4baee0de
script: catch empty dataframe in script plot1dc84bf6
profile: adjust plot end date for positive controlsbe431cdf
workflow: restrict nextclade clades again76e31f68
resources: add breakpoints by clade07d96a1e
workflow: fix bug in plot_historical2086b254
sc2rf: update lineage mapping0d911b20
env: reorganize dependenciesc165074b
workflow: separate plot_breakpoints into separate script0497dffd
profile: controls-negative include false positives8bb41c45
script: add separate report slides for clade/lineage parent breakpoints411bc235
workflow: remove subtree params and add secondary cladesf80386aa
sc2rf: catch empty secondary csv6d1b03e0
sc2rf: add optional secondary csv for #51a629eb06
workflow: detect recombination with BA.5 for #51c44b468a
env: remove plotly and kaleido from enva485215d
dataset: add proposed771 to controls for #51eba827a9
script: define lineages by parental lineages for #46d6319bad
workflow: relax nextclade exclusion filter for #48fb7a0a4f
env: upgrade nextclade to v2.2.0 for #5031ec45be
resources: update breakpoints parents nomenclature7f00564d
(unverified) sc2rf,postprocess: add unique_subs to output3e3e46c2
workflow: implement cov-spectrum query to identify parent lineages041de538
workflow: customize nextclade to run with our without recombinant dataset7acc598f
script: remove edges from stacked bar plots for issue #43ab589bb7
docs: add mark horsman for ideas and design151e481d
bug: fix svg font export for issues #42b34c1452
bug: fix ouput typo in issues_download601a0c7a
resources: also output svg for issues breakpointsa2a8b00d
workflow: add plotting issues breakpoints to rule issues_download5a5a2f41
resources: update breakpoints plot2113f6f9
script: plot breakpoints of curate lineages in resources5f4b901d
docs: add new breakpoints image to readmebfdf2191
workflow: cleanup Thumbs869f3b4e
script: add breakpoints as a plot and report slidee988f251
bug: fix missing rule_name for _historical8bc6aef4
env: add plotly and kaleido to env066e0c00
resources: add 781 789 798 to curated breakpoints4ff587c5
resources: update issues and curated breakpointscc146425
docs: add instructions for updating conda env8592a156
bug: fix usher_collapse metadata output to allow for hCoV-19 prefix9ea4f1b3
docs: add --recurse-submodules instruction to updating
-
Default parameters have been updated! Please regenerate your profiles/builds with:
bash scripts/create_profile.sh --data data/custom
-
Rule outputs are now in sub-directories for a cleaner
results
directory. -
The in-text report (
report.pptx
) statistics are no longer cumulative counts of all sequences. Instead they, will match the reporting period in the accompanying plots.
- Improve subtree collapse effiency (#35).
- Improve subtree aesthetics and filters (#20).
- Fix issues rendering as float (#29).
- Explicitly control the dimensions of plots for powerpoint embedding.
- Remove hard-coded
extra_cols
(#26). - Fix mismatch in lineages plot and description (#21).
- Downstream steps no longer fail if there are no recombinant sequences (#7).
- Add new rule
usher_columns
to augment the base usher metadata. - Add new script
parents.py
, plots, and report slide to summarize recombinant sequences by parent. - Make rules
plot
andreport
more dynamic with regards to plots creation. - Exclude the reference genome from alignment until
faToVcf
. - Include the log path and expected outputs in the message for each rule.
- Use sub-functions to better control optional parameters.
- Make sure all rules write to a log if possible (#34).
- Convert all rule inputs to snakemake rule variables.
- Create and document a
create_profile.sh
script. - Implement the
--low-memory
mode parameter within the scriptusher_metadata.sh
.
-
Create new controls datasets:
controls-negatives
controls-positives
controls
-
Add versions to
genbank_accessions
forcontrols
.
- Upgrade UShER to v0.5.4 (possibly this was done in a prior ver).
- Remove
taxonium
andchronumental
from the conda env.
-
Add parameters to control whether negatives and false_positives should be excluded:
exclude_negatives: false
false_positives: false
-
Add new optional param
max_placements
to rulelinelist
. -
Remove
--show-private-mutations
fromdebug_args
of rulesc2rf
. -
Add optional param
--sc2rf-dir
tosc2rf
to enable execution outside ofsc2rf
dir. -
Add params
--output-csv
and--output-ansi
to the wrapperscripts/sc2rf.sh
. -
Remove params
nextclade_ref
andcustom_ref
from rulenextclade
. -
Change
--breakpoints 0-10
insc2rf
.
-
Re-rename tutorial action to pipeline, and add different jobs for different profiles:
- Tutorial
- Controls (Positive)
- Controls (Negative)
- Controls (All)
-
Output new
_historical
plots and slides for plotting all data over time. -
Output new file
parents.tsv
to summarize recombinant sequences by parent. -
Order the colors/legend of the stacked bar
plots
by number of sequences. -
Include lineage and cluster id in filepaths of largest plots and tables.
-
Rename the linelist output:
linelist.tsv
positives.tsv
negatives.tsv
false_positives.tsv
lineages.tsv
parents.tsv
-
The
report.xlsx
now includes the following tables:- lineages
- parents
- linelist
- positives
- negatives
- false_positives
- summary
- issues
pull/19
docs: add lenaschimmel as a contributor for code
2f8b498a
docs: update changelog for v0.3.00486d3be
docs: add updating section to readme for issue #33e8eda400
resources: updates issues with curate breakpoints12e3700f
bug: catch empty dataframe in plotd1ccca2a
workflow: first successful high-throughput runcd741a10
workflow: add new rules plot_historical and report_historicalc2cc1380
env: remove openpyxl from environment7dc7c039
workflow: remove rule report_redact #319ca5f822
script: rearrange merge file order in summaryaa28eb9f
workflow: create new rule report_redact for #314748815d
env: add openpyxl to environment for excel parsing in python0060904a
script: template duplicate labelling in usher_collapse for latera82359a7
data: add accession versions to controls metadataaf7341aa
data: add accession versions to controls metadatad860a4c8
workflow: add new rule usher_columns to augment the base usher metadata2511673d
improve subtree collapse effiency (#35) and output aesthetics (#20)1e81be3b
bug: remove non-existant param --log in rule usher_metadata02198b4c
script: add logging to usher_collapsed40d3d78
ci: don't run pipeline just for images changesb880d9c8
docs: update powerpoint image to proper ver2d6514a0
docs: update demo excel and slides with links and content59c24ffe
bug: fix typo that prevented low_memory_mode from activating4d94df1d
bug: continue supply missing build param to params functionsc16c3377
bug: supply missing build param to _params_linelistc31c2204
docs: remove plotting data table from FAQ5461cbf2
docs: describe how to include more custom metadata columns7295c8c0
script: implement low memory mode within usher_metadata script6588f619
workflow: restore original config merge logicae96cf3d
docs: rearrange sections in READMEe99cdef9
docs: add tips for speeding up usher in FAQ753d1e1d
resources: add proposed759 to curated breakpoints1ea5610e
docs: change troubleshooting section to FAQ42152710
workflow: add logging to sc2rf_recombinants for issue #34ca930fe3
bug: fix status of designated recombinants missed by sc2rf (XB)2c6102a6
script: in plotting data replace counts that are empty string with 00c7fa988
docs: tidy up comments in default parameters.yaml43c61d43
bug: fix sc2rf postprocessing bug where sequences with only parent were missing filtered regions6a00c866
ci: split jobs by profile for testing profile creation (#27)aeabf009
ci: add new action profile_creation to test script create_profile.sh9a6758e2
add controls section to READMEef250a22
script: add -controls suffix to profiles created with --controls param150a3e17
docs: update troubleshooting section90b406c8
script: remove --partition flag to scripts/slurm.sha0c6ece2
docs: update google drive link to example slidesa37afeea
docs: update instructions for create_profile.shf9d050d2
add execute permissions to scripts38b5b422
bug: use a full loop to check issue formatting307b4f67
catch issues list when converting to str639f8c26
bug: fix issues rendering as float in tables for issue #2935ea4be1
remove param --sc2rf-dir from scripts/sc2rf.sh5a2a9520
docs: update images for excel and powerpoint3ae737d5
env: comment out yarn which is a dev dependency3842e898
improve logging in create_profile84c684ca
workflow: separate profiles for controls,controls-positive,controls-negativead5e8e4b
limit missing strains output from create_profilea4898ecf
docs: update development notes34ee2fff
docs: add links to contributors pluginsb6b0c999
revert to automated all-contributorse1a248f8
add @yatisht and @AngieHinrichs to credits for ushere3f432c4
start adding contributors862757bd
docs: create .all-contributorsrc [skip ci]a0532792
docs: update README.md [skip ci]6e67e73f
update unpublished regex to solve #215ba6b37b
remove taxonium and chronumental from env2a5fc627
add artifacts for all pipelines664b2e9b
fix trailing whitespace in metadataeada2fa3
fix negative controls metadata9aecd69a
fix plot dimensions for pptx embed657e8838
fix outdir for linelist6fb389dc
fix input type for controls build8ed69ce0
upload tutorial pptx as artifactc6e647d2
update ci profile for test action19cdb8ed
lint pipeline22e3aa6b
split controls action into positives,negatives,and all33491320
rename action Tutorial to Pipelineda2890d6
fix profile in install action8a4d4fbb
lint all filesb167ea45
update readme with profile creation instructions4d2848b9
add script to generate new profiles and builds407f8aba
checkpoint before auto-generating buildsccb3471b
add new negatives dataset964a22f8
(broken) script overhaul1f1ca1b4
add param --sc2rf-dir to allow execution outside of main directory21541f02
add exclude_negatives and exclude_false_positives to parameters0b5854a2
update docs58b6396a
split controls data into positives and negatives11f9f6a4
consolidate positives and negatives profiles into controls581255c8
generalize hpc profiles1e2a70a4
update HPC instructions in READMEa18b19e3
(broken) add negatives data and profile11817639
(broken) make plots and report dynamice833d151
create tutorial-hpc profilec4ac5699
remove redundant profile laptopc5107017
remove ci profile4be07e79
actually rename pipeline to tutorial89c4d6b5
rename pipeline action to tutorial7614b399
exclude alpha beta gamma by default from Nextcladed2c2461c
update dev docsf9368d11
remove proposed636 which is now XZ4fbd0ce4
add XAA and XAB to resources65efd145
add xz to resourcesf3641b19
add parents slide to report and excel4e25f665
add new script parents to summarize recombinants by parent0decd47a
catch when no designated sequences are present189fbb2a
update resources breakpoints3c5b4965
update sc2rf with new logic for terminal deletionse37d68d9
update issues and breakpoints761d41bf
use date in changelog for report3c486dbc
add zip to environment1ff37195
add more info about system config
- New optional param
motifs
for rulesc2rf_recombinants
. - New param
weeks
for new ruleplot
. - Removed
prev_linelist
param.
- Switch from a pdf
report
to powerpoint slides for better automation. - Create summary plots.
- Split
report
rule intolinelist
andreport
. - Output
svg
plots.
- New rule
plot
. - Changed growth calculation from a comparison to the previous week to a score of sequences per day.
- Assign a
cluster_id
according to the first sequence observed in the recombinant lineage. - Define a recombinant lineage as a group of sequences that share the same:
- Lineage assignment
- Parents
- Breakpoints or phylogenetic placement (subtree)
- For some sequences, the breakpoints are inaccurate and shifted slightly due to ambiguous bases. These sequences can be assigned to their corresponding cluster because they belong to the same subtree.
- For some lineages, global prevalence has exceeded 500 sequences (which is the subtree size used). Sequences of these lineages are split into different subtrees. However, they can be assigned to the correct cluster/lineage, because they have the same breakpoints.
- Confirmed not to use deletions define recombinants and breakpoints (differs from published)?
- Move
sc2rf_recombinants.py
topostprocess.py
in ktmeaton fork ofsc2rf
. - Add false positives filtering to
sc2rf_recombinants
based on parents and breakpoints.
- Add section
Configuration
toREADME.md
.
c2369c75
update CHANGELOG after README overhaul9c8a774e
update autologs to exclude first blank line in notes2a8a7af5
overhaul README9c2bd2f5
change asterisks to dashes46d4ec81
update autologs to allow more complex notes contenta01a903c
split docs into dev and todo23e8d715
change color palette for plotting785b8a19
add optional param motifs for sc2rf_recombinantsd1c1559e
restore pptx template to regular view6adc5d32
add seaborn to environment35a04471
add changelog to report pptx99e98aa7
add epiweeks to environment1644b1fc
add pptx report1ab93aff
(broken) start plotting094530f0
swithc sc2rf to a postprocess script02193d6e
try generalizing sc2rf post-processing
- Fix bug in
sc2rf_recombinants
regions/breakpoints logic. - Fix bug in
sc2rf
where a sample has no definitive substitutions.
-
Allow
--breakpoints 0-4
, for XN. We'll determine the breakpoints in post-processing. -
Bump up the
min_len
ofsc2rf_recombinants
to 1000 bp. -
Add param
mutation_threshold
tosc2rf
. -
Reduce default
mutation_threshold
to 0.25 to catch [Issue #591](cov-lineages/pango-designation#591_. -
Bump up subtree size from 100 sequences to 500 sequences.
- Trying to future proof against XE growth (200+ sequences)
-
Discovered that
--primers
interferes with breakpoint detection, use only for debugging. -
Only use
--enable-deletions
insc2rf
for debug mode. Otherwise it changes breakpoints. -
Only use
--private-mutations
tosc2rf
for debug mode. Unreadable output for bulk sample processing.
-
Change
sc2rf_lineage
column to use NA for no lineage found.- This is to troubleshot when only one breakpoint matches a lineage.
-
Add
sc2rf_mutations_version
to summary based on a datestamp ofvirus_properties.json
. -
Allow multiple issues in report.
-
Use three status categories of recombinants:
- Designated
- Proposed
- Unpublished
-
Add column
status
to recombinants. -
Add column
usher_extra
tousher_metadata
for 2022-05-06 tree. -
Separate out columns lineage and issue in
report
. -
Add optional columns to report.
-
Fixed growth calculations in report.
-
Add a Definitions section to the markdown/pdf report.
-
Use parent order for breakpoint matching, as we see same breakpoint different parents.
-
Add the number of usher placements to the summary.
-
Set Auspice default coloring to
lineage_usher
where possible. -
Remove nwk output from
usher
andusher_subtrees
:- Pull subtree sample names from json instead
-
Output
linelist.exclude.tsv
of false-positive recombinants.
- Update
nextclade_dataset
to 2022-04-28. - Add
taxoniumtools
andchronumental
to environment. - Separate nextclade clades and pango lineage allele frequences in
sc2rf
. - Exclude BA.3, BA.4, and BA.5 for now, as their global prevalence is low and they are descendants of BA.2.
-
Add a
tutorial
profile.- (N=2) Designated Recombinants (pango-designation)
- (N=2) Proposed Recombinants (issues, UCSC)
- (N=2) Unpublished Recombinants
-
Add XL to
controls
. -
Add XN to
controls
. -
Add XR to
controls
. -
Add XP to
controls
.
-
Split
usher_subtree
andusher_subtree_collapse
into separate rules.- This speeds up testing for collapsing trees and styling the Auspice JSON.
-
Force include
Nextclade
recombinants (auto-pass throughsc2rf
). -
Split
usher
andusher_stats
into separate rules.
pull/13
Three status categories: designated, proposed, unpublished
10388a6e
update docs for v0.2.032b9e8ab
separate usher and usher_stats rule and catch 3 or 4 col usher70da837c
update github issues and breakpoints for 636 and 6379ed10f17
skip parsing github issues that don't have body216cb28e
put the date in the usher data for tutorialc95cca0e
update usher v0.5.398c91bee
finish reporting cycle 2022-05-11e4755f16
new sc2rf mutation data by clade4a501d56
separate omicron lineages from omicron cladesd6185aaf
testing change to auto-pass nextclade recombinants2e02922f
add XL XN XR XP to controlse2c9675b
add usher_extra and qc file to sc2rf recombinants941a64c5
update github issues943cde95
add usher placements to summary0d0ffbd4
combine show-private-mutations with ignore-sharedf1d7e6c1
update sc2rf after terminal bugfixes8f4fd95a
add country England to geo resefeeb6ca
add mutation threshold param sep for sc2rf9c42bc6c
limit table col width size in report7d746e3f
fix growth calculation0dc7f464
identify sc2rf lineage by breakpoints and parents19fc3721
add parents to breakpoints and issues27bbff0a
generate geo_resolutions from ncov defaults lat longs86aa78ba
add map to auspice subtrees2499827e
add taxoniumtools and chronumental to envec46a569
change tutorial seq names from underscores to dashes27170a0b
fix issues line endingse8eb1215
update nextclade dataset to 2022-04-2889335c3a
(broken) updating columns in report67475ecc
update sc2rf79b7b2b9
add tip to readme10df6a54
remove all sample extraction from usher2ffbcb61
switch sc2rf submodule to ktmeaton forke2adaabe
disable snakemake report in pipeline cif12fef14
edit line linst preview instructionsd10eb730
add collection date to tutoriald4e0aa86
very preliminary credits and tutoriale9c41e6e
change ci pipeline to tutorial build4de7370d
add tutorial data8d8c88fc
set min version for click to troubleshoot env creationc7fb50a4
better issue reportingb2699823
update sc2rf
-
Add lineage
XM
to controls.- There are now publicly available samples.
-
Correct
XF
andXJ
controls to match issues. -
Create a markdown report with program versions.
-
Fix
sc2rf_recombinants
bug where samples with >2 breakpoints were being excluded. -
Summarize recombinants by parents and dates observed.
-
Change
report.tsv
tolinelist.tsv
. -
Use
date_to_decimal.py
to createnum_date
for auspice subtrees. -
Add an
--exclude-clades
param tosc2rf_recombinants.py
. -
Add param
--ignore-shared-subs
tosc2rf
.- This makes regions detection more conservative.
- The result is that regions/clade will be smaller and breakpoints larger.
- These breakpoints more closely match pango-designation issues.
-
Update breakpoints in controls metadata to reflect the output with
--ignore-shared-subs
. -
Bump up
min_len
forsc2rf_recombinants
to 200 bp. -
Add column
sc2rf_lineage
tosc2rf_recombinants
output.- Takes the form of X*, or proposed{issue} to follow UShER.
-
Consolidate lineage assignments into a single column.
- sc2rf takes priority if a single lineage is identified.
- usher is next, to resolve ties or if sc2rf had no lineage.
-
Slim down the conda environment and remove unnecessary programs.
augur
seaborn
snipit
bedtools
- Comment out dev tools:
git
andpre-commit
.
-
Use github api to pull recombinant issues.
-
Consolidate *to* files into
resources/issues.tsv
. -
Use the
--clades
param ofsc2rf
rather than usingexclude_clades
. -
Disabled
--rebuild-examples
insc2rf
because of requests error. -
Add column
issue
torecombinants.tsv
. -
Get
ncov-recombinant
version using tag. -
Add documentation to the report.
- What the sequences column format means: X (+X)
- What the different lineage classifers.
pull/10
Automated report generation and sc2rf lineage assignments
941f0c08
update CHANGELOGbce219b6
fix notes conflictcdb3bc7f
fix duplicate pr output in autologs0a8ffd84
update notes for v0.1.20075209b
add issue to recombinants report0fd8fea0
(broken) troubleshoot usher collapse jsonb0ab72b9
update breakpointsa3058c57
update sc2rf examplesb5f2a0f8
update and clean controls dataf69f254c
add curated breakpoints to issues4e9d2dc9
add an issues script and resources file16059610
major environment reduction5ced9193
remove bedtools from env28777cb9
remove seaborn from env01f459c4
remove snipit from env7d6ef729
remove augur from env9a2e948d
fix pandas warnings in report78daee55
remove script usher_plot as now we rely on auspicea7bd52f1
remove script update_controls which is now done manually93a30200
consolidate lineage call in report5f3e3633
hardcode columns for report6333fde4
improve type catching in date_to_decimal557627a4
update controls breakpoints and add col sc2rf_lineage7e2ad531
add param --ignore-shared-subs to sc2rf519a9eea
overhaul reporting workflow6fa674f2
update controls metadata and dev notes46155a84
add XM to controlsd2d4cd80
update notes for v0.1.217ae6eeb
add issue to recombinants report2ab8dd30
(broken) troubleshoot usher collapse jsone4fd3352
update breakpointsc743ad3d
update sc2rf examples62e1ffc1
update and clean controls data9c401a0c
add curated breakpoints to issues7cf953ad
add an issues script and resources filefbf35e51
major environment reduction1b1f16e9
remove bedtools from env7b650279
remove seaborn from envb32608e7
remove snipit from env9cab231e
remove augur from enva69836a8
fix pandas warnings in reportb3137ed3
remove script usher_plot as now we rely on auspice06fa080d
remove script update_controls which is now done manuallye8d46a64
consolidate lineage call in reportccfea688
hardcode columns for report5172755e
improve type catching in date_to_decimal74f0e528
update controls breakpoints and add col sc2rf_lineaged4664015
add param --ignore-shared-subs to sc2rf659d7f83
overhaul reporting workflowe8e3444f
update controls metadata and dev notes07a5ff52
add XM to controls
-
Add lineage
XD
to controls.- There are now publicly available samples.
-
Add lineage
XQ
to controls.- Has only 1 diagnostic substitution: 2832.
-
Add lineage
XS
to controls. -
Exclude lineage
XR
because it has no public genomes.XR
decends fromXQ
in the UShER tree.
-
Test
sc2rf
dev to ignore clade regions that are ambiguous. -
Add column
usher_pango_lineage_map
that maps github issues to recombinant lineages. -
Add new rule
report
. -
Filter non-recombinants from
sc2rf
ansi output. -
Fix
subtrees_collapse
failing if only 1 tree specified -
Add new rule
usher_metadata
for merge metadata for subtrees.
b8f89d5e
update docs for v0.1.12b8772ab
update autologs for pr date matchingf2a7547d
add low_memory_mode for issue #994fc9426
add log to report rule861ffb17
update usher output imagefdee0da6
add new rule usher_metadatac5003453
add max parents param to sc2rf recombinants70cad049
add max ambig filter to defaults for sc2rfb4cc40f4
add script to collapse usher metadata for auspice847c6d24
catch single trees in usher collapse636778e0
rename sc2rf txt output to ansibae50814
change final target to report9a830085
add new rule report601e1728
add XD to controlsbaa1d64e
relax sc2rf --unique from 2 to 1 for XQbffbb9ad
add column sc2rf_clades_filter109ed5d2
test sc2rf dev to not report ambiguous regionsd9fdffef
fix tab spaces at end of usher placement085e1764
update sc2rf for tsv/csv PRd2363855
set threads and cpus to 1 for all single-thread rules13027205
impose wildcard constraint on nextclade_dataset30e1406b
fix typo in csv path for sc2rfd6d8377a
add XS breakpoints to metadata371e4069
add XS and XQ to controls1e31e429
add breakpoints reference file in controls3088ad60
catch if sc2rf has no output64211360
catch all extra args in slurmc7a6b9ce
remove unused nextclade_recombinants script38645ce9
remove codecov badge50fa9d89
update CHANGELOG for v0.1.0
-
Add Stage 1: Nextclade.
-
Add Stage 2: sc2rf.
-
Add Stage 3: UShER.
-
Add Stage 4: Summary.
-
Add Continuous Integration workflows:
lint
,test
,pipeline
, andrelease
. -
New representative controls dataset:
- Exclude XA because this is an Alpha recombinant (poor lineage accuracy).
- Exclude XB because of current issue
- Exclude XC because this is an Alpha recombinant (poor lineage accuracy).
- Exclude XD because there are no public genomes.
- Exclude XK because there are no public genomes.
pull/3
Add Continuous Integration workflows: lint, test, pipeline, and release.
34c721b7
rearrange summary cols18b389de
disable usher plotting0a101dd9
covert sc2rf_recombinants to a script494fb60c
change nextclade filtering to multi columnsf0676bd8
add python3 directive to unit tests9da626fd
update unit tests for new controls dataset46717c6f
fix summary script for new ver9fc690ed
fix usher public-latest links1fb28ea0
change sc2rf to bash scriptba98d936
update sc2rf params89647109
update sc2rf artpoon pr2b7dcab4
ignore my_profiles9df0f9b4
add debugging mode for sc2rf (lint)caaf4deb
add debugging mode for sc2rf7ec53ac6
remove unecessary param exp_inputa3810bca
add program versions to summary3376196b
Add Continuous Integration workflows: lint, test, pipeline, and release. (#3)db45768f
more instructions for visualizing outputee4c2660
update readme keywordscb5b2c23
add instructions for slurm submissionf04eb28b
adjust laptop profile to use 1 cpueafed44e
add snakefileeb214d72
add scriptscb47b046
add README.md3794d8de
add images dirf1c0cef8
add profiles laptop and hpc6f5fd84b
add release notes92b0c476
add default parametersc79bfe12
add submodule ncov2a65e92d
add submodule sc2rf0f8bfe33
add submodule autologsb6f1e1d6
add report captions68fd2dec
add conda env8907375e
add reference dataset