This repository has been archived by the owner on Jan 31, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 22
Post processing of Somatic Variation Models
zskidmor edited this page Apr 30, 2014
·
3 revisions
Post-processing of somatic-variation models can be accomplished through the use of the following Genome Model Tool: 'gmt somatic process-somatic-variation'
This tool deduplicates variant calls, filters out off-target sites, tiers the variants, adds dbSNP and GMAF information, etc. Appropriate files for manual review (XML and bed files) can also be generated.
If processing a large number of samples, you can give them all the same output-directory. Each model's output will be stored in a subdirectory with the sample name, and the manual-review files will all be grouped together in a review/ subdirectory.
This tool has many options, but a typical command might look like this:
$ gmt somatic process-somatic-variation --somatic-variation-model-id 12345678 --output-dir somatic-validation --add-dbsnp-and-gmaf --add-tiers --restrict-to-target-regions --create-review-files --tiers-to-review 1 --igv-reference-name=b37
- --add-dbsnp-and-gmaf appends columns with dbSNP ids and global minor allele frequency (GMAF)
-
--add-tiers appends a column containing the tier of each variant
-
--create-review-files generates bed and xml files necessary to do manual review
-
--tiers-to-review choose which tiers of variants should be placed into the bed files for review (default: 1)
-
--igv-reference-name provide the reference name for the IGV session (most commonly: b37)
-
--get-readcounts append readcounts from the normal and tumor bams for each variant
-
--restrict-to-target-regions only keep calls in target-regions (as specified by the target_regions on the build)
- --filter-regions
-
--filter-sites Pass in either a list of variants to remove or a list of regions from which all variant calls will be removed.
-
--sites-to-pass Pass in a list of sites that should be output, even if they would otherwise be filtered
-
--required-snv-callers If set to a value greater than 1, requires that N variant callers independently call a variant before it is reported. Occassionally useful for filtering noisy data down to a manageable list.
-
--sample-name This name will be used for the output directory, instead of the subject_name from the model.
-
samplename/
-
snvs.indels.annotation Final output, containing filtered and annotated variants. Other columns are appended based on options above (tier, dbsnp, readcounts, etc)
-
snvs.indels.annotation.xls The same data as above, converted to xls format
-
snvs/ intermediate files from filtering, annotation, etc.
-
indels/ intermediate files from filtering, annotation, etc.
-
review/ - files generated for manual review
-
samplename.xml
-
samplename.bed