Skip to content

Latest commit

 

History

History
93 lines (71 loc) · 4.22 KB

README.md

File metadata and controls

93 lines (71 loc) · 4.22 KB

The DDL calls 4 functions (modules or subworkflows) described below:

    //
    // MODULE: EXTRACTS ANCESTRALLY LINKED BUSCO GENES FROM FULL TABLE
    //
    EXTRACT_ANCESTRAL(
        ch_grab,
        ancestral_table
    )
    ch_versions             = ch_versions.mix(EXTRACT_ANCESTRAL.out.versions)
    //
    // LOGIC: STRIP OUT METADATA
    //
    ch_grab
        .map { meta, fulltable
                -> fulltable
            }
        .set { assignanc_input }
    //
    // MODULE: ASSIGN EXTRACTED GENES TO ANCESTRAL GROUPS
    //
    ASSIGN_ANCESTRAL(
        EXTRACT_ANCESTRAL.out.comp_location,
        assignanc_input
    )
    ch_versions             = ch_versions.mix(EXTRACT_ANCESTRAL.out.versions)
    //
    // MODULES: SORT THE BED FILE
    //
    BEDTOOLS_SORT(
        ASSIGN_ANCESTRAL.out.assigned_bed,
        []
    )
    ch_versions             = ch_versions.mix(BEDTOOLS_SORT.out.versions)
    //
    // MODULES: CONVERT BED TO INDEXED BIGBED
    //
    UCSC_BEDTOBIGBED(
        BEDTOOLS_SORT.out.sorted,
        dot_genome.map{ it[1] },    // Pull file from tuple(meta, file)
        buscogene_as
    )
    ch_versions             = ch_versions.mix(UCSC_BEDTOBIGBED.out.versions)
    emit:
    ch_ancestral_bigbed     = UCSC_BEDTOBIGBED.out.bigbed
    versions                = ch_versions.ifEmpty(null)

The first step calls a local NF DDL module EXTRACT_ANCESTRAL - mostly setup code in preparation for running the following command:

buscopainter.py -r $ancestraltable -q $fulltable

buscopainter is not part of Busco, so a new tool is needed with a content expert to advise and test. Need a content expert to help figure out how these parameters should be setup in Galaxy tool XML.

The second step, ASSIGN_ANCESTRAL, is a local NF module. It runs a one line bash script:

assign_anc.py -l $comp_location -f $fulltable -c ${prefix}_assigned.bed

That python code is also found in the treeval/bin directory. Again, a new tool is needed and a content expert to advise and to test. The $variables are DDL but work for Galaxy tools too._ ${prefix}_ is a subtask name idiom in DDL. Need a content expert to help figure out how these parameters should be setup in Galaxy tool XML.

BG: The python script is using pandas to play around with the table. With a bit of luck, we can use https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy0 which is a tool offering many of pandas functionality.

The third step is a bedtools sort on that output, already available as an IUC bedtools tool.

The last step is UCSC-bedtobigbed using:

  bedToBigBed
        $bed \\
        $sizes \\
        $as_option \\
        $args \\
        ${prefix}.bigBed

Not found in the toolshed, so again, another trivial new tool needed and expert advice on how to get those parameters from users. It can probably be decoded by looking at all the DDL carefully but frankly, life is too short.

A quick check of the Toolshed shows that bedtools sort (sort_bed.xml) is available, but 3 other requirements do not seem to be findable, so 3 new Galaxy tools need to be built and acceptance-tested by a content expert:

  1. buscopainter
  2. assign_ancestry
  3. ucsc bedtobigbed

They are mostly one line scripts to be turned into simple new tools. Finding all the necessary test data will be more work.