Conversation
The BLAST workflow currently .collect()s all per-sample prepared CSVs into a single list and hands them to one LABKEY_UPLOAD_BLAST and one LABKEY_UPLOAD_FASTA process invocation. Those processes discover files via os.listdir() and loop over them. This means no upload can start until every sample finishes preparation, and a failure in any batch fails the entire upload. The GOTTCHA2 workflow already uses the eager pattern: each upload process receives a per-sample queue channel tuple, fires as soon as that sample is ready, and emits its own log. Logs are .mix()ed together for downstream gating. This commit brings BLAST uploads in line with that pattern. The change is entirely in the Nextflow wiring (bundle_blast_for_labkey.nf and stat_blast_workflow.nf). The Python upload scripts are unchanged — they already handle the single-file case correctly because their os.listdir() loop naturally finds one file when Nextflow stages one file.
The v2.4.0 schema was released with that tag and should not be modified. New params from feature branches (deacon, GOTTCHA2, etc.) need a new schema version to land in.
7f9e769 to
d36cdc3
Compare
Split dedup into dedup_seq and dedup_pos
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why a separate PR for the schema version?
The v2.4.0 params schema was released alongside the v2.4.0 tag and is now immutable — users in the wild may be referencing it by URL in their params files for reproducibility. Any new pipeline parameters (from deacon integration, GOTTCHA2, etc.) need to land in a new schema version rather than retroactively modifying v2.4.0.
By putting the version bump in its own PR, we avoid the situation where every feature branch that adds params has to independently create the v2.5.0 file. Instead, this PR establishes v2.5.0 as the open development schema, and feature branches simply add their properties to it.
What changed
This PR creates
nvd-params.v2.5.0.schema.jsonas an identical copy of the released v2.4.0 schema with only the$idfield updated. It also points thelatestsymlink and theSCHEMA_URLinparams.pyat the new version. The v2.4.0 file is untouched.Working with this in feature branches
If your feature branch adds new pipeline parameters, it should be based on top of this commit so that it can add properties to the v2.5.0 schema file.
With jujutsu: If you created your branch off
mainbefore this landed, rebase onto theschema-v2.5.0bookmark (or ontomainafter this merges):To start a new feature branch that depends on this one:
With git: If your branch predates this change, rebase onto the
schema-v2.5.0branch (or ontomainafter this merges):To start a new feature branch from it:
Related
This is a dependency of #14 (deacon integration).