run the post-processing steps in the prepare pipeline in parallel #14

aryarm · 2020-06-09T18:15:17Z

At the end of the prepare pipeline, a couple of post-processing steps are performed on the merged TSV before we feed it to the classify pipeline. All of the scripts used in these steps support reading from stdin and writing to stdout except for fillna.bash

remove the first parameter from fillna.bash and make it read the TSV from stdin, instead
connect all of the post-processing steps together via pipes
- this will allow us to save on file IO and wasted time compressing and uncompressing the file between steps
remove extra config params that nobody uses (like keepna, pure_numerics, and friends) - they just make things more complicated
mark extra files as temp

The text was updated successfully, but these errors were encountered:

…#14) also remove extra config params that nobody uses and mark some extra files as temp

aryarm added enhancement New feature or request low-priority labels Jun 9, 2020

aryarm added this to the VarCA v2.0.0 milestone Jul 12, 2021

aryarm self-assigned this Jul 12, 2021

aryarm added a commit that referenced this issue Jul 16, 2021

make fillna.bash read from stdin and connect steps via pipes (resolves …

5b92328

…#14) also remove extra config params that nobody uses and mark some extra files as temp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run the post-processing steps in the prepare pipeline in parallel #14

run the post-processing steps in the prepare pipeline in parallel #14

aryarm commented Jun 9, 2020 •

edited

Loading

run the post-processing steps in the prepare pipeline in parallel #14

run the post-processing steps in the prepare pipeline in parallel #14

Comments

aryarm commented Jun 9, 2020 • edited Loading

aryarm commented Jun 9, 2020 •

edited

Loading