Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping rate no longer reported by any pipeline #85

Open
IanSudbery opened this issue Jan 3, 2019 · 6 comments
Open

Mapping rate no longer reported by any pipeline #85

IanSudbery opened this issue Jan 3, 2019 · 6 comments

Comments

@IanSudbery
Copy link
Contributor

Since the mapping pipeline was split into mapping and bamstats, as far as I can tell no pipeline now reports very basic statistics about mapped files, such as % mapping rate , % spliced reads etc.

By preference I think that the mapping pipeline should report this for two reasons:

  • I can't imagine anyone ever mapping a set of reads and not wanting to see the mapping rate
  • The best way to obtain the mapping rate is going to depend on the mapper used. For example, STAR reports it in its output file, where as for BWA it will need to be calculated from the BAM file.

I will try to have a look at this and the mapping tuples/compression option thing #80 this week if I find time.

@Acribbs
Copy link
Contributor

Acribbs commented Feb 21, 2019

I now usually use multiqc stats as my mapping rate.

@IanSudbery
Copy link
Contributor Author

IanSudbery commented Feb 21, 2019

running multiqc via cgatflow mapping make build_report produces the following error:

Traceback (most recent call last):
      File "/shared/sudlab1/General/apps/conda/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 748, in run_pooled_job_without_exceptions
        register_cleanup, touch_files_only)
      File "/shared/sudlab1/General/apps/conda/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 632, in job_wrapper_output_files
        output_files_only=True)
      File "/shared/sudlab1/General/apps/conda/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 561, in job_wrapper_io_files
        ret_val = user_defined_work_func(*(params[1:]))
      File "/shared/sudlab1/General/apps/conda/cgat-flow-devel/cgatpipelines/tools/pipeline_mapping.py", line 2238, in renderMultiqc
        P.run(statement)
      File "/shared/sudlab1/General/apps/conda/cgat-core/cgatcore/pipeline/execution.py", line 1335, in run
        benchmark_data = r.run(statement_list)
      File "/shared/sudlab1/General/apps/conda/cgat-core/cgatcore/pipeline/execution.py", line 939, in run
        job_path)
      File "/shared/sudlab1/General/apps/conda/cgat-core/cgatcore/pipeline/execution.py", line 866, in collect_single_job_from_cluster
        job_id, retval.exitStatus, "".join(stderr), statement))
    OSError: ---------------------------------------
    Job 3564931 exited with error code 1: 
    The stderr was: 
    /etc/bashrc: line 12: PS1: unbound variable
    [WARNING]         multiqc : MultiQC Version v1.7 now available!
    [INFO   ]         multiqc : This is MultiQC v1.5.dev0
    [INFO   ]         multiqc : Template    : default
    [INFO   ]         multiqc : Searching '.'
    [WARNING]         multiqc : No analysis results found. Cleaning up..
    [INFO   ]         multiqc : MultiQC complete
    mv: cannot stat 'multiqc_report.html': No such file or directory
    
    export LC_ALL=en_GB.UTF-8 && export LANG=en_GB.UTF-8 && multiqc . -f && mv multiqc_report.html MultiQC_report.dir/

@Acribbs
Copy link
Contributor

Acribbs commented Mar 3, 2019

Which mapper were you using? I suspect the outputs of our logs do not match the required input for some of our mappers in MultiQC. I know this is the case for salmon in transdiffexpres and maybe I think for STAR in mapping. I think it is due to the way we redirect the outputs to logs.

@IanSudbery
Copy link
Contributor Author

We mostly use STAR, Salmon and BWA.

BWA isn't even supported by MultiQC, mostly because i don't think it outputs a log file of any sort.

@Acribbs
Copy link
Contributor

Acribbs commented Mar 4, 2019

For STAR:
This MultiQC module parses summary statistics from the Log.final.out log files. Sample names are taken either from the filename prefix (sampleNameLog.final.out) when set with --outFileNamePrefix in STAR

However, the output of our star mapping produces this.

When I run the pipeline for STAR is generates the appropriate output for both bowtie and STAR (im using our pipeline test data), but obviously not bwa . The reason they down support BWA is because the logs don't produce anything worth parsing so their idea was to rely on downstream tools. See: MultiQC/MultiQC#162

Did your mapper fail or is there something else that prevented logs being output from STAR?

@IanSudbery
Copy link
Contributor Author

The particular example here is BWA (which is probably the mapper we use the most - we do most RNAseq with salmon these days).

We used to actually calculate the mapping rate rather than rely on logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants