-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slamdunk all command treats paired-end reads as separate samples #165
Comments
While I am not affiliated with SLAMDUNK, I can tell you it was only designed to analyze single-end data. Analyzing paired-end data has been discussed in several previous issues (like #147, #148, #57). To summarize some of the suggestions in those posts you could consider:
Best, |
Dear Isaac, Thank you for your detailed response regarding SLAMDUNK and paired-end data analysis. Your suggestions are very helpful. Based on your recommendations, I'm considering two options:
Could you possibly provide any additional insights about the relative strengths of these three tools for paired-end data analysis? This would help me make a more informed decision. Thank you again for your assistance. Best regards, |
Hello Wang, Full disclosure, I developed fastq2EZbakR and am not affiliated with the other two, but will try to provide an unbiased view of the pros and cons of each.
I hope this helps, |
I'll also try to tag the developers of the other packages in case they want to weigh in: @popitsch @florianerhard |
Hi Wang and Isaac Rnalib is meant to be(come) a general Python library for the analysis of RNAseq data, so nucleotide-conversion sequencing (NC-seq) is not its main focus (I am rather working on a more specialized package on top of rnalib for this purpose). You can, however, use rnalib for annotating numbers of convertible positions and number of actual conversions per read using its In this simplified SLAMseq tutorial, reads are then indeed simply classified into 'new' (one or more T-to-C conversions) and 'old' (no conversions) which is inaccurate as Isaac correctly points out (it misclassifies 'new' reads that do not contain a T-to-C conversion by chance which is a rather large fraction given the low 4sU concentrations in typical SLAMseq experiments). A better way is indeed to fit these data to a binomial mixture model as implemented, e.g., in GRAND-SLAM and EZbakR (NB: I have zero experience with the latter tool). I will try to add a section that demonstrates this to my SLAMseq tutorial. So, in a nutshell, rnalib is very flexible but it’s not an end-to-end tool for NC-seq data, so you might resort to the mentioned approaches unless you are interested in developing a new method :) BW niko [*] tag_tc() takes a BAM file that contained MD tags (e.g., added by a mapper such as STAR or via samtools calmd) and adds two tags (xt and xc) containing the number of T's (convertible positions) and the number of T-to-C conversions respectively while masking T-to-C SNPs and filtering for mapping and/or per-base qualities, see [here](https://github.com/popitsch/rnalib/blob/main/rnalib/tools.py) for details. |
Dear Isaac, Thank you for your thoughtful and detailed response. I sincerely appreciate your patience in getting back to me. Your explanation regarding SLAMDUNK's limitations with paired-end data is extremely helpful. I understand now that it was primarily designed for single-end data analysis. Both solutions you've proposed are valuable:
I'm particularly grateful for your recommendations of alternative tools that are better suited for paired-end data analysis:
These suggestions will definitely help guide my research in the right direction. I will carefully evaluate these options to determine the most suitable approach for my work. Thank you again for your expert guidance! Best regards, |
Hello !
I am trying to analyze paired-end sequencing data with SLAM-DUNK using the slamdunk all command. However, it seems that SLAM-DUNK is treating my R1 and R2 reads as separate samples instead of paired-end data.
My command is: slamdunk all -r */hg38.fa -b */human/3utr.bed -o /output/ -5 0 -t 30 -ss -rl 150 s4u/.fq.gz
I expected SLAM-DUNK to recognize and process these files as paired-end data. However, the output directory contains separate results for each FASTQ file, as if they were independent samples.
My output files like this:
dtt_R1.fq_slamdunk_mapped.bam
dtt_R2.fq_slamdunk_mapped.bam
How can I correctly specify paired-end data with the slamdunk all command? Or should I use separate slamdunk align commands for R1 and R2 followed by the other subcommands?
The text was updated successfully, but these errors were encountered: