Skip to content

03_Genome assembly

Denise Ravinale edited this page May 7, 2024 · 1 revision
  • Flye

To assemble the PacBio DNA sequencing reads, the software Flye was used initially.

Input: Fastq-files with raw PacBio DNA-reads.

Output: Fasta files of assembled genome.

To assess the quality of the resulting assembly from Flye, Quast was used to create a quality report. The report showed that the total length of the assembled genome was 34.7Mb, this result is reasonable since in the assembly performed in the experiment by Bin Tean et al. scaffold 6 had a length of 26.6Mb. This combined with the fact that the number of contigs was not too high resulted in the decision to keep the assembly.

Figure 5. Initial Quast report.

Skärmavbild 2024-04-24 kl  19 19 53

  • BWA, Pilon and RepeatMasker

To further increase the quality of the assembly, Pilon was used to correct the assembly with the Illumina short DNA-reads. Prior to running Pilon the Illumina DNA-reads was aligned to the assembly using the aligner BWA.

Input: Fasta file containing the assembled PacBio reads, as well as the Illumina DNA-reads.

Output: Default ouptput from BWA is a SAM file containing the mapped Illumina DNA-reads but this was converted to a BAM file by piping the output to samtools view.

To view how many of the Illumina DNA-reads had mapped to the assembled genome, Samtools flagstat was used and it showed that 97% of the reads was successfully mapped to the assembly using the following command:

samtools flagstat -@ 2 /proj/uppmax2024-2-7/nobackup/work/denise_paper4_bam_files/bwa/bwa_illumina_alignment_to_uncorrected_assembly.bam > /home/dera0219/project_paper4/02_genome_assembly/assembly_correction/bwa/samtools_flagstat_bwa_quality_check_result.txt

The resulting BAM-file was then sorted and indexed using Samtools sort and Samtools index. The resulting files were used as input along with the assembled genome to Pilon which resulted in a corrected assembly. Following the Pilon correction RepeatMasker was used to mask the repeated regions so that these are not included in downstream analysis.

To see if the correction of the assembly made a difference in quality, Quast was performed once again.

Figure 6. Quast report following Pilon correction and repeatmasker.

Skärmavbild 2024-04-24 kl  19 37 07

Clone this wiki locally