This repository provides support for SVision downstream CSV filter and analysis.
Please install pandas, numpy and intervaltree.
The call set for the paper is ./supports/HG00733.svision.s5.graph.vcf.
NOTE: Default values in the config file are used to produce results in the paper
The config file requires:
-
Chromosomes of interest. Default value constains the autosomes.
-
The path to bedtools.
-
The path to RepeatMasker and Tandem Repeat Finder annotated human reference genome GRCh38. Please download TRF and RMSK in BED format.
-
Regions to exclude in the filter. A BED file is avaiable.
-
The path to reference genome used in SV detection.
python FilterMain.py supports/HG00733.svision.s5.graph.vcf -g ./supports/HG00733.graph_exactly_match.txt -w ./output_dir -i 0,3
This will generate three files:
prefix.filtered.vcf: SVision discoveries filtered by low mapping quality regions, gaps and centromeres.
prefix.Raw-CSVs.tsv: SVision CSVs filtered by graph structures.
prefix.HQ-CSVs.tsv: CSVs additionally filtered by tandem repeats.