Skip to content

Workflow generation should fail if no cohorts are provided #118

@MattWellie

Description

@MattWellie

def get_multicohort() -> MultiCohort:
"""
Return the cohort or multicohort object based on the workflow configuration.
"""
input_datasets = config_retrieve(['workflow', 'input_datasets'], None)
# pull the list of cohort IDs from the config
custom_cohort_ids = config_retrieve(['workflow', 'input_cohorts'], None)
if input_datasets:
raise ValueError('Argument input_datasets is deprecated, use input_cohorts instead')
if isinstance(custom_cohort_ids, list) and len(custom_cohort_ids) <= 0:
raise ValueError('No custom_cohort_ids found in the config')
# NOTE: When configuring sgs in the config is deprecated, this will be removed.
if custom_cohort_ids and not isinstance(custom_cohort_ids, list):
raise ValueError('Argument input_cohorts must be a list')
# After the check for no cusotom_cohort_ids in the config convert
# to a tuple for the cache decorator
custom_cohort_ids = tuple() if not custom_cohort_ids else tuple(custom_cohort_ids)
return create_multicohort(custom_cohort_ids)

This code block checks for both input_datasets (cpg_workflows) and input_cohorts (cpg_flow), fails if input_datasets are present, then goes on to do some type checking on the input_cohorts.

Given the default value of None for input_cohorts when retrieving from config, the check for a 0-length list fails:

if isinstance(custom_cohort_ids, list) and len(custom_cohort_ids) <= 0:
    ...

The None value is then converted to an empty tuple, so nothing is retrieved from Metamist, and there are no targets for any of the Stage types, so nothing is planned.

Example consequence: Slack Thread

Resolution?

  • Remove any processing of input_datasets
  • Remove any tolerance of 0-length input_cohorts
  • Check after retrieving cohorts from Metamist that at least one SG was retrieved during workflow setup (i.e. don't tolerate 0-SG Cohorts as valid input)

Alternative?

  • Create a Stage type for non-SG related stages? e.g. processing of reference data.
    • This would be a substantial deviation from the current stage planning logic, and probably a maintenance nightmare

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions