Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Processing WES data with ASCAT

The following files have been derived from the reference files in the Battenberg package. No filter related to the genomic location was applied so they contain both exonic, intronic and intergenic SNPs. When using such files, one must either 1) provide a BED file which defines regions of interest (BED_file in ascat.prepareHTS) or 2) downsample the files so they are tailored to your sequencing design (option 2 will speed-up ASCAT but only recommended for advanced users). This is because a WES experiment would only cover a small fraction of the genome so we have to provide an exhaustive list of SNPs to start with, considering that only a subset would be covered. As such, reference files for WGS have a lower resolution (there are not meant to be downsampled) and must not be used for processing WES data. Also, because reference files for WES contain an exhaustive list of SNPs, they must not be used for processing WGS.

Please note that such files can also be used for processing targeted sequencing data (with an appropriate BED file). Since they contain exonic/intronic/intergenic SNPs, they should be applicable to a broad range of designs.

Data availability:

Loci files: hg19 & hg38 (unzip and set loci.prefix="G1000_loci_hg19_chr" in ascat.prepareHTS)
Allele files: hg19 & hg38 (unzip and set alleles.prefix="G1000_alleles_hg19_chr" in ascat.prepareHTS)
GC correction file: hg19 & hg38 (unzip and set GCcontentfile="GC_G1000_hg19.txt" in ascat.correctLogR)
Replication timing correction file: hg19 & hg38 (unzip and set replictimingfile="RT_G1000_hg19.txt" in ascat.correctLogR)

File format

Loci file

One file per chromosome, no header, the first column is the chromosome name and the second column is the position.


1	10642
1	11008
1	11012

Allele file

One file per chromosome, one header, the first column is the position and the second and third columns are reference and alternate nucleotides with the following conversion: A=1, C=2, G=3 and T=4.

position	a0	a1
10642	3	1
11008	2	3
11012	2	3

In the example above, SNP at position 10642 is G>A and SNPs at 11008 and 11012 are both C>G.

GC correction file

One single file, one header, the first column is the SNP ID, the second/third columns are chromosome/position and the other columns are GC% around SNPs with different window sizes.

	Chr	Position	25bp	50bp	100bp	200bp	500bp	1kb	2kb	5kb	10kb	20kb	50kb	100kb	200kb	500kb
1_10642	1	10642	0.92	0.823529	0.762376	0.761194	0.722555	0.677323	0.625457	0.595799	0.590039	0.5845710.533734	0.458927	0.421891	0.425195	0.423964
1_11008	1	11008	0.72	0.745098	0.722772	0.741294	0.730539	0.705295	0.594703	0.593501	0.594541	0.5832120.534297	0.457987	0.42164	0.425088	0.423964
1_11012	1	11012	0.76	0.705882	0.742574	0.741294	0.726547	0.706294	0.595202	0.593964	0.594478	0.5831820.53433	0.457971	0.421633	0.425084	0.423964

Replication timing file

One single file, one header, the first column is the SNP ID, the second/third columns are chromosome/position and the other columns are replication timing data in different cell lines.

	Chr	Position	Bg02es	Bj	Gm06990	Gm12801	Gm12812	Gm12813	Gm12878	Helas3	Hepg2	Huvec	Imr90	K562	Mcf7	Nhek	Sknsh
1_10642	1	10642	49.509453	62.858498	52.757858	61.294971	51.757736	43.72905	48.088467	54.11837	58.062084	47.565636	68.790581	68.970825	57.467934	56.897934	60.012413
1_11008	1	11008	49.509453	62.858498	52.757858	61.294971	51.757736	43.72905	48.088467	54.11837	58.062084	47.565636	68.790581	68.970825	57.467934	56.897934	60.012413
1_11012	1	11012	49.509453	62.858498	52.757858	61.294971	51.757736	43.72905	48.088467	54.11837	58.062084	47.565636	68.790581	68.970825	57.467934	56.897934	60.012413

'chr'-based versus non 'chr'-based reference

Please note that loci files provided above are not 'chr'-based (chromosome names are '1', '2', '3', etc. and not 'chr1', 'chr2', 'chr3', etc.). If your BAMs are 'chr'-based, you will need to add 'chr' (Bash: for i in {1..22} X; do sed -i 's/^/chr/' G1000_loci_hg19_chr${i}.txt; done). ASCAT will internally remove 'chr' so the other files (allele, GC correction and RT correction) should not be modified and chrom_names (ascat.prepareHTS) should be c(1:22,'X').

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WES

WES

README.md

Processing WES data with ASCAT

File format

Loci file

Allele file

GC correction file

Replication timing file

'chr'-based versus non 'chr'-based reference

Files

WES

Directory actions

More options

Directory actions

More options

Latest commit

History

WES

Folders and files

parent directory

README.md

Processing WES data with ASCAT

File format

Loci file

Allele file

GC correction file

Replication timing file

'chr'-based versus non 'chr'-based reference