Mapping rate no longer reported by any pipeline #85

IanSudbery · 2019-01-03T09:22:19Z

Since the mapping pipeline was split into mapping and bamstats, as far as I can tell no pipeline now reports very basic statistics about mapped files, such as % mapping rate , % spliced reads etc.

By preference I think that the mapping pipeline should report this for two reasons:

I can't imagine anyone ever mapping a set of reads and not wanting to see the mapping rate
The best way to obtain the mapping rate is going to depend on the mapper used. For example, STAR reports it in its output file, where as for BWA it will need to be calculated from the BAM file.

I will try to have a look at this and the mapping tuples/compression option thing #80 this week if I find time.

Acribbs · 2019-02-21T15:33:55Z

I now usually use multiqc stats as my mapping rate.

IanSudbery · 2019-02-21T15:52:28Z

running multiqc via cgatflow mapping make build_report produces the following error:

Traceback (most recent call last):
      File "/shared/sudlab1/General/apps/conda/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 748, in run_pooled_job_without_exceptions
        register_cleanup, touch_files_only)
      File "/shared/sudlab1/General/apps/conda/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 632, in job_wrapper_output_files
        output_files_only=True)
      File "/shared/sudlab1/General/apps/conda/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 561, in job_wrapper_io_files
        ret_val = user_defined_work_func(*(params[1:]))
      File "/shared/sudlab1/General/apps/conda/cgat-flow-devel/cgatpipelines/tools/pipeline_mapping.py", line 2238, in renderMultiqc
        P.run(statement)
      File "/shared/sudlab1/General/apps/conda/cgat-core/cgatcore/pipeline/execution.py", line 1335, in run
        benchmark_data = r.run(statement_list)
      File "/shared/sudlab1/General/apps/conda/cgat-core/cgatcore/pipeline/execution.py", line 939, in run
        job_path)
      File "/shared/sudlab1/General/apps/conda/cgat-core/cgatcore/pipeline/execution.py", line 866, in collect_single_job_from_cluster
        job_id, retval.exitStatus, "".join(stderr), statement))
    OSError: ---------------------------------------
    Job 3564931 exited with error code 1: 
    The stderr was: 
    /etc/bashrc: line 12: PS1: unbound variable
    [WARNING]         multiqc : MultiQC Version v1.7 now available!
    [INFO   ]         multiqc : This is MultiQC v1.5.dev0
    [INFO   ]         multiqc : Template    : default
    [INFO   ]         multiqc : Searching '.'
    [WARNING]         multiqc : No analysis results found. Cleaning up..
    [INFO   ]         multiqc : MultiQC complete
    mv: cannot stat 'multiqc_report.html': No such file or directory
    
    export LC_ALL=en_GB.UTF-8 && export LANG=en_GB.UTF-8 && multiqc . -f && mv multiqc_report.html MultiQC_report.dir/

Acribbs · 2019-03-03T16:07:32Z

Which mapper were you using? I suspect the outputs of our logs do not match the required input for some of our mappers in MultiQC. I know this is the case for salmon in transdiffexpres and maybe I think for STAR in mapping. I think it is due to the way we redirect the outputs to logs.

IanSudbery · 2019-03-04T10:11:10Z

We mostly use STAR, Salmon and BWA.

BWA isn't even supported by MultiQC, mostly because i don't think it outputs a log file of any sort.

Acribbs · 2019-03-04T12:55:59Z

For STAR:
This MultiQC module parses summary statistics from the Log.final.out log files. Sample names are taken either from the filename prefix (sampleNameLog.final.out) when set with --outFileNamePrefix in STAR

However, the output of our star mapping produces this.

When I run the pipeline for STAR is generates the appropriate output for both bowtie and STAR (im using our pipeline test data), but obviously not bwa . The reason they down support BWA is because the logs don't produce anything worth parsing so their idea was to rely on downstream tools. See: MultiQC/MultiQC#162

Did your mapper fail or is there something else that prevented logs being output from STAR?

IanSudbery · 2019-03-04T14:06:55Z

The particular example here is BWA (which is probably the mapper we use the most - we do most RNAseq with salmon these days).

We used to actually calculate the mapping rate rather than rely on logs.

jscaber mentioned this issue Feb 15, 2022

bamstats counts unaligned reads as intergenic. #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping rate no longer reported by any pipeline #85

Mapping rate no longer reported by any pipeline #85

IanSudbery commented Jan 3, 2019

Acribbs commented Feb 21, 2019

IanSudbery commented Feb 21, 2019 •

edited

Loading

Acribbs commented Mar 3, 2019

IanSudbery commented Mar 4, 2019

Acribbs commented Mar 4, 2019

IanSudbery commented Mar 4, 2019

Mapping rate no longer reported by any pipeline #85

Mapping rate no longer reported by any pipeline #85

Comments

IanSudbery commented Jan 3, 2019

Acribbs commented Feb 21, 2019

IanSudbery commented Feb 21, 2019 • edited Loading

Acribbs commented Mar 3, 2019

IanSudbery commented Mar 4, 2019

Acribbs commented Mar 4, 2019

IanSudbery commented Mar 4, 2019

IanSudbery commented Feb 21, 2019 •

edited

Loading