General changes:
- Additional lines for the new option
--coveragefor the genome and transcriptome modes of the simulator on the mainREADME.mdfile. - Added the
-xor--coverageflag for thesimulator.pyscript. This option allows users to specify their target coverage for the simulation without any additional calculations on their end. Coverage is calculated based on raw read coverage (using the Lander/Waterman equation) and employs kernel density estimation functions for the aligned and unaligned read lengths, fitted on empirical data trained with the read_analysis.py script and specified to the simulator with the--model_prefixflag. The system automatically applies kernel density estimation functions and the aligned/unaligned reads ratio to calculate the mean read length. It then counts the number of bases in the reference and divides that number by the mean read length to determine the number of reads required to achieve 1x raw read coverage. Subsequently, the number of reads needed to reach the specified raw read coverage is inferred by multiplying the number of reads for 1x coverage by the specified raw read coverage (#242).
genome mode:
- For the
genomemode of thesimulator.pyscript, the coverage is calculated using the reference genome specified by the-rgor--ref-gflag.
trancriptome mode:
- For the
transcriptomemode of thesimulator.pyscript, the coverage is calculated using the reference transcriptome specified by the-rtor--ref_tflag.
metagenome mode:
- We currently do not support
--coverageoption for themetagenomemode of thesimulator.pyscript.
Notes:
- We expect this approach to estimate the coverage precisely enough. However, users should also be aware that if they specify minimum, maximum, or mean length for the reads that are substantially different than the emprical data, the calculated coverage might not estimate the output coverage.