generated from populationgenomics/cpg-python-template-repo
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
cpg-flow/src/cpg_flow/inputs.py
Lines 69 to 92 in 9e47409
| def get_multicohort() -> MultiCohort: | |
| """ | |
| Return the cohort or multicohort object based on the workflow configuration. | |
| """ | |
| input_datasets = config_retrieve(['workflow', 'input_datasets'], None) | |
| # pull the list of cohort IDs from the config | |
| custom_cohort_ids = config_retrieve(['workflow', 'input_cohorts'], None) | |
| if input_datasets: | |
| raise ValueError('Argument input_datasets is deprecated, use input_cohorts instead') | |
| if isinstance(custom_cohort_ids, list) and len(custom_cohort_ids) <= 0: | |
| raise ValueError('No custom_cohort_ids found in the config') | |
| # NOTE: When configuring sgs in the config is deprecated, this will be removed. | |
| if custom_cohort_ids and not isinstance(custom_cohort_ids, list): | |
| raise ValueError('Argument input_cohorts must be a list') | |
| # After the check for no cusotom_cohort_ids in the config convert | |
| # to a tuple for the cache decorator | |
| custom_cohort_ids = tuple() if not custom_cohort_ids else tuple(custom_cohort_ids) | |
| return create_multicohort(custom_cohort_ids) |
This code block checks for both input_datasets (cpg_workflows) and input_cohorts (cpg_flow), fails if input_datasets are present, then goes on to do some type checking on the input_cohorts.
Given the default value of None for input_cohorts when retrieving from config, the check for a 0-length list fails:
if isinstance(custom_cohort_ids, list) and len(custom_cohort_ids) <= 0:
...
The None value is then converted to an empty tuple, so nothing is retrieved from Metamist, and there are no targets for any of the Stage types, so nothing is planned.
Example consequence: Slack Thread
Resolution?
- Remove any processing of input_datasets
- Remove any tolerance of 0-length input_cohorts
- Check after retrieving cohorts from Metamist that at least one SG was retrieved during workflow setup (i.e. don't tolerate 0-SG Cohorts as valid input)
Alternative?
- Create a Stage type for non-SG related stages? e.g. processing of reference data.
- This would be a substantial deviation from the current stage planning logic, and probably a maintenance nightmare
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels