Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Consolidate CLI #650

Merged
merged 33 commits into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c0077ab
.
tedil Nov 28, 2024
cc28a90
use TranscriptSettings in test args
tedil Dec 16, 2024
2031dbd
lints
tedil Dec 16, 2024
133e7da
merge tx dbs separated by genome release
tedil Dec 16, 2024
e5fd61e
load csq predictors depending on genome release in the server subcmd
tedil Dec 16, 2024
0a29b1c
also skip freq and clinvar dbs if assembly does not match, more info …
tedil Dec 17, 2024
6e43c13
lint: explicit named lifetimes
tedil Dec 17, 2024
1f568f8
rephrase info about skipped databases
tedil Dec 17, 2024
8f4decb
check whether the predictor could be successfully instantiated, other…
tedil Dec 17, 2024
b6df08c
fix typo
tedil Dec 17, 2024
7f159db
rename fn seqvars to consequence
tedil Dec 17, 2024
d64eaad
remove databases.is_empty assertion because that is tested anyway
tedil Dec 17, 2024
d0672ab
update sources help texts
tedil Jan 2, 2025
eea025e
fmt
tedil Jan 2, 2025
e93b5f1
include strand in seqvars/csq
tedil Jan 2, 2025
ff126ce
add frequency endpoint (wip)
tedil Jan 2, 2025
551d6cf
update 'try: …' hints to openapi
tedil Jan 2, 2025
40796ab
do not initialize predictors/annotators multiple times
tedil Jan 2, 2025
9a70495
add frequency to apidocs
tedil Jan 2, 2025
cc07876
also allow multiple --frequencies and --clinvar options
tedil Jan 2, 2025
b0af0ea
whitespace
tedil Jan 2, 2025
4097675
add clinvar endpoint
tedil Jan 2, 2025
720b934
merge origin/main
tedil Jan 2, 2025
05784eb
update warning for multiple clinvar or freq dbs
tedil Jan 2, 2025
ed2b713
update openapi schema
tedil Jan 2, 2025
f1c010f
update entrypoint to match new server run cli
tedil Jan 3, 2025
ee9ebf1
fix server run clinvar docstrings
tedil Jan 3, 2025
5fd01b5
update openapi.schema.yaml accordingly
tedil Jan 3, 2025
78d646c
merge origin/main
tedil Jan 24, 2025
ec077dc
remove unused path_db from test
tedil Jan 24, 2025
61ee096
only print hints for available endpoints
tedil Jan 24, 2025
096c922
server: add exemplary Grch38 hints
tedil Jan 24, 2025
fa2cbeb
fix typo in fn name
tedil Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
345 changes: 344 additions & 1 deletion openapi.schema.yaml

Large diffs are not rendered by default.

123 changes: 123 additions & 0 deletions src/annotate/cli.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
use clap::Args as ClapArgs;
use strum::{Display, VariantArray};

#[derive(Debug, ClapArgs)]
#[group(required = true, multiple = true)]
pub struct Sources {
/// Transcript database containing the transcript information.
///
/// Pre-built databases are available at https://github.com/varfish-org/mehari-data-tx/releases
#[arg(long)]
pub transcripts: Option<Vec<String>>,

/// Frequency database.
///
/// The frequency database contains gnomAD frequencies for the variants.
/// Pre-built databases are available at TODO
#[arg(long)]
pub frequencies: Option<Vec<String>>,

/// ClinVar database.
///
/// The ClinVar database contains clinical significance information for the variants.
/// Pre-built databases are available at https://github.com/varfish-org/annonars-data-clinvar/releases
#[arg(long)]
pub clinvar: Option<Vec<String>>,
}

#[derive(Debug, ClapArgs, Default, Clone)]
pub struct TranscriptSettings {
/// The transcript source.
#[arg(long, value_enum, default_value_t = TranscriptSource::Both)]
pub transcript_source: TranscriptSource,

/// Whether to report only the most severe consequence, grouped by gene, transcript, or allele.
#[arg(long)]
pub report_most_severe_consequence_by: Option<ConsequenceBy>,

/// Which kind of transcript to pick / restrict to. Default is not to pick at all.
///
/// Depending on `--pick-transcript-mode`, if multiple transcripts match the selection,
/// either the first one is kept or all are kept.
#[arg(long)]
pub pick_transcript: Vec<TranscriptPickType>,

/// Determines how to handle multiple transcripts. Default is to keep all.
///
/// When transcript picking is enabled via `--pick-transcript`,
/// either keep the first one found or keep all that match.
#[arg(long, default_value = "all")]
pub pick_transcript_mode: TranscriptPickMode,
}

#[derive(
Debug,
Copy,
Clone,
PartialEq,
Eq,
PartialOrd,
Ord,
Display,
clap::ValueEnum,
VariantArray,
parse_display::FromStr,
)]
pub enum ConsequenceBy {
Gene,
Transcript,
// or "Variant"?
Allele,
}

#[derive(
Debug,
Copy,
Clone,
PartialEq,
Eq,
PartialOrd,
Ord,
Display,
clap::ValueEnum,
VariantArray,
parse_display::FromStr,
)]
pub enum TranscriptPickType {
ManeSelect,
ManePlusClinical,
Length,
EnsemblCanonical,
RefSeqSelect,
GencodePrimary,
Basic,
}

#[derive(Debug, Copy, Clone, Display, clap::ValueEnum, Default)]
pub enum TranscriptPickMode {
#[default]
First,
All,
}

/// Enum that allows to select the transcript source.
#[derive(
Debug,
Clone,
Copy,
PartialEq,
Eq,
Default,
serde::Deserialize,
serde::Serialize,
clap::ValueEnum,
)]
pub enum TranscriptSource {
/// ENSEMBL
Ensembl,
/// RefSeq
RefSeq,
/// Both
#[default]
Both,
}
1 change: 1 addition & 0 deletions src/annotate/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use noodles::vcf::header::FileFormat;
use noodles::vcf::variant::record::samples::series::value::genotype::Phasing;
use noodles::vcf::variant::record_buf::samples::sample::value::Genotype;

pub(crate) mod cli;
pub mod seqvars;
pub mod strucvars;

Expand Down
46 changes: 12 additions & 34 deletions src/annotate/seqvars/csq.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
//! Compute molecular consequence of variants.
use super::{
ann::{Allele, AnnField, Consequence, FeatureBiotype, FeatureType, Pos, Rank, SoFeature},
provider::Provider as MehariProvider,
};
use crate::annotate::cli::{ConsequenceBy, TranscriptSource};
use crate::pbs::txs::{GenomeAlignment, Strand, TranscriptBiotype, TranscriptTag};
use enumflags2::BitFlags;
use hgvs::parser::{NoRef, ProteinEdit, UncertainLengthChange};
Expand All @@ -14,12 +19,6 @@ use std::cmp::Ordering;
use std::ops::Range;
use std::{collections::HashMap, sync::Arc};

use super::{
ann::{Allele, AnnField, Consequence, FeatureBiotype, FeatureType, Pos, Rank, SoFeature},
provider::Provider as MehariProvider,
ConsequenceBy,
};

/// A variant description how VCF would do it.
#[derive(Debug, PartialEq, Eq, Clone, Default)]
pub struct VcfVariant {
Expand All @@ -33,28 +32,6 @@ pub struct VcfVariant {
pub alternative: String,
}

/// Enum that allows to select the transcript source.
#[derive(
Debug,
Clone,
Copy,
PartialEq,
Eq,
Default,
serde::Deserialize,
serde::Serialize,
clap::ValueEnum,
)]
pub enum TranscriptSource {
/// ENSEMBL
Ensembl,
/// RefSeq
RefSeq,
/// Both
#[default]
Both,
}

/// Configuration for consequence prediction.
#[derive(Debug, Clone, derive_builder::Builder)]
#[builder(pattern = "immutable")]
Expand Down Expand Up @@ -84,7 +61,7 @@ impl Default for Config {
pub struct ConsequencePredictor {
/// The internal transcript provider for locating transcripts.
#[derivative(Debug = "ignore")]
provider: Arc<MehariProvider>,
pub(crate) provider: Arc<MehariProvider>,
/// Assembly mapper for variant consequence prediction.
#[derivative(Debug = "ignore")]
mapper: assembly::Mapper,
Expand Down Expand Up @@ -1247,10 +1224,10 @@ impl ConsequencePredictor {
#[cfg(test)]
mod test {
use super::*;
use crate::annotate::cli::{TranscriptPickType, TranscriptSettings};
use crate::annotate::seqvars::provider::ConfigBuilder as MehariProviderConfigBuilder;
use crate::annotate::seqvars::{
load_tx_db, run_with_writer, Args, AsyncAnnotatedVariantWriter, PathOutput,
TranscriptPickType,
};
use crate::common::noodles::{open_variant_reader, open_variant_writer, NoodlesVariantReader};
use csv::ReaderBuilder;
Expand Down Expand Up @@ -1729,10 +1706,11 @@ mod test {
path_output_vcf: Some(output.as_ref().to_str().unwrap().into()),
path_output_tsv: None,
},
transcript_source: Default::default(),
report_most_severe_consequence_by: Some(ConsequenceBy::Allele),
pick_transcript: vec![TranscriptPickType::ManeSelect],
pick_transcript_mode: Default::default(),
transcript_settings: TranscriptSettings {
report_most_severe_consequence_by: Some(ConsequenceBy::Allele),
pick_transcript: vec![TranscriptPickType::ManeSelect],
..Default::default()
},
max_var_count: None,
hgnc: None,
sources: crate::annotate::seqvars::Sources {
Expand Down
Loading
Loading