Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CNV #407

Open
MarineBergot opened this issue Jul 15, 2022 · 14 comments
Open

add CNV #407

MarineBergot opened this issue Jul 15, 2022 · 14 comments

Comments

@MarineBergot
Copy link

Hi ! thanks for your work, i would like to know if it's possible to add CNV/SNV on the plot? thanks :)

i'm using:
python 3.8
pyGenomeTracks 3.7

@lldelisle
Copy link
Collaborator

Hi @MarineBergot ,
Do you mean from a database? Common CNV/SNV? All we plot in pyGenomeTracks rely on local files. I don't know in which format are encoded SNV/CNV.

@MarineBergot
Copy link
Author

Thanks for your super quick answer! no it's local files, with our pipeline we have file with all CNV/SNV in vcf or gvcf format
My goal would be to plot area with TAD, gene expression, CNV/SNV and genes

@lldelisle
Copy link
Collaborator

For the moment, we don't support vcf. Could you make a scheme on how you would represent them, so when we have time to develop new features, we know how it could be useful? Meanwhile I guess you can convert the vcf to bed (if you don't know how to do, just do a head of your vcf I will give you the command)...

@MarineBergot
Copy link
Author

Hi again ! i'm trying to add CNV to my (not perfect) plot, i have issue with color. I tried to cheat and added color with bed12 but i don't have bed12. i converted vcf to bed with rgb color regarding the type of CNV but i don't know what to put in the 3 last columns then it's not working (i tried with 0 and 1 but no )
my file looks like that :

chr1 43593267 43594572 . 0 . 43593267 43594572 188,100,241
chr1 30195841 218197336 . 0 . 30195841 218197336 188,100,241
chr1 205209141 205210012 . 0 . 205209141 205210012 188,100,241
chr1 237402443 237403261 . 0 . 237402443 237403261 188,100,241
chr1 247146749 247149421 . 0 . 247146749 247149421 188,100,241
chr2 207644871 207645844 . 0 . 207644871 207645844 188,100,241
chr3 44699170 44701119 . 0 . 44699170 44701119 188,100,241
chr3 93470002 93471081 . 0 . 93470002 93471081 188,100,241
chr3 165092497 165093482 . 0 . 165092497 165093482 188,100,241
chr4 49104007 49658148 . 0 . 49104007 49658148 188,100,241

my .ini file

[x-axis]
where = top

[hic matrix]
file = dijhic008/hicMatrix/dijhic008_50000_merge_norm_correct.h5
title = 50kb dijhic008 chromothripsis

depth is the maximum distance plotted in bp. In Hi-C tracks

the height of the track is calculated based on the depth such

that the matrix does not look deformed

depth = 5000000
transform = log1p
file_type = hic_matrix

[tads]
file =dijhic008_TADs_50000_merge_domains.bed
file_type = domains
border_color = black
color = none

the tads are overlay over the hic-matrix

the share-y options sets the y-axis to be shared

between the Hi-C matrix and the TADs.

overlay_previous = share-y

[spacer]
height = 0.05

[hic matrix]
file = dijhic001_50000_norm_correct.h5
title = 50kb dijhic001 CTRL

depth is the maximum distance plotted in bp. In Hi-C tracks

the height of the track is calculated based on the depth such

that the matrix does not look deformed

depth = 1000000
transform = log1p
file_type = hic_matrix

[tads]
file = dijhic001_TADs_50000_domains.bed
file_type = domains
border_color = black
color = none

the tads are overlay over the hic-matrix

the share-y options sets the y-axis to be shared

between the Hi-C matrix and the TADs.

overlay_previous = share-y

[spacer]
height = 0.05

[test arcs overlay]
file =dijarn085.arc
line_width = 3
height=3

[gtf exons]
file = grch38.refseq.with_genes.gtf
height = 10
title = gtf (exons and transcripts)
fontsize = 12
file_type = gtf
prefered_name = gene_name

[CNV]
file = dijen080_cnv_bed12.bed
height = 7
title = genes (bed12) style = flybase; fontsize = 10; height_utr = 0.7
style = flybase
fontsize = 10
height_utr = 0.7

[bigwig]
file = ENCFF757LRW.bw
color = blue
height = 4
title = CTCF
min_value = 0
max_value = 30

[vlines]
file = genes.bed
type = vlines

do you have any idea i could do to have on color regarding the type of CNV?
(and how to put my 2 TAD plot on the same scale? )
and if you have something for RNAseq data? like to add some plot for each gene with the number of count to see if the gene is down or up ? sorry for all my asking and thanks again for your incredible help!

TAD_chromo

@lldelisle
Copy link
Collaborator

Hi,

  1. For the CNVs, you have currently a bed9 (9 fields) and you can use this in pyGenomeTracks. To see them, in color, you can use the option color = bed_rgb and I would not put label as they are '.' for all. So it would be:
[CNV]
file = dijen080_cnv_bed12.bed
height = 3
title = CNV
color = bed_rgb
labels = false
  1. For the genes, I don't know if you want to see all transcripts for a single gene. If you don't you can use the options merge_transcripts = true and merge_overlapping_exons = true.
  2. For the Hi-C matrices, I guess you did not specify the --binSize option in hicBuildMatrix. This is why you have 2 different resolutions. What I would do:
    a. Run hicInfo -m dijhic008/hicMatrix/dijhic008_50000_merge_norm_correct.h5 to get the resolution of the first matrix.
    b. Rerun hicBuildMatrix --samFiles .... --binSize ... for the second: dijhic001 with the binSize obtained in (a.)
    c. Rerun correction and TAD calling on corrected matrices
  3. For the RNA-seq, we don't have specific tracks. I think there are 3 options:
    a. You make a coverage from your RNA-seq with bedtools and you plot the 2 bedgraphs/bigwigs
    b. You compute FPKMs or other measure and you put this value in the fifth column of a bed and you plot the 2 beds with color = Reds or any other colormap (https://matplotlib.org/stable/tutorials/colors/colormaps.html).
    c. You extract from a DE analysis the log2 fold-change and you generate a bed with this value in the fifth field and you plot this bed with color = bwr or any other Diverging colormap (https://matplotlib.org/stable/tutorials/colors/colormaps.html#diverging).

@lldelisle
Copy link
Collaborator

PS: For the next exchanges, would you mind to put your ini file in a code block? You can generate a code block with triple ` followed by ini:

```ini
your ini file
```

Thanks

@MarineBergot
Copy link
Author

thanks for your quick answer, i tried as you suggested for CNV but nothing is shown :/
this is my bed (i'm supposed to have only one CNV in this area):

chr8	26054488	26055284	None	0	.	26054488	26055284	188,100,241
chr8	43238145	43242668	None	0	.	43238145	43242668	188,100,241
chr8	85261666	85262425	CA13	0	.	85261666	85262425	188,100,241
chr8	72111027	72111742	None~None	0	.	72111027	72111742	242,243,244
chr8	84952812	84953527	None~None	0	.	84952812	84953527	242,243,244
chr8	99145226	99145941	VPS13B~VPS13B	0	.	99145226	99145941	242,243,244
chr8	130897333	130898048	ADCY8~ADCY8	0	.	130897333	130898048	242,243,244
chr8	51817225	51817940	PCMTD1~None	0	.	51817225	51817940	242,243,244
chr8	51818564	51819279	PCMTD1~None	0	.	51818564	51819279	242,243,244
chr8	14488547	14489262	SGCZ~GXYLT1	0	.	14488547	14489262	242,243,244
chr8	15431497	15432212	None~KLF12	0	.	15431497	15432212	242,243,244
chr8	30287528	30288243	None~None	0	.	30287528	30288243	242,243,244
chr8	28576084	28576799	None~None	0	.	28576084	28576799	242,243,244
chr8	133959623	133960338	None~None	0	.	133959623	133960338	242,243,244
chr8	93653916	93654631	LINC00535~None	0	.	93653916	93654631	242,243,244
chr8	643942	649756	ERICH1	0	.	643942	649756	221,20,35
chr8	6111142	6115456	None	0	.	6111142	6115456	221,20,35
chr8	15919242	15930056	None	0	.	15919242	15930056	221,20,35
chr8	25114442	25133956	None	0	.	25114442	25133956	221,20,35
`[x-axis]
where = top

[hic matrix]
file = hicMatrix/dijhic008_25000_merge_norm_correct.h5
title = 25kb dijhic008 chromothripsis
# depth is the maximum distance plotted in bp. In Hi-C tracks
# the height of the track is calculated based on the depth such
# that the matrix does not look deformed
depth = 3000000
transform = log1p
file_type = hic_matrix
max_value=50

[tads]
file = TADs/dijhic008_TADs_25000_merge_domains.bed
file_type = domains
border_color = black
color = none
# the tads are overlay over the hic-matrix
# the share-y options sets the y-axis to be shared
# between the Hi-C matrix and the TADs.
overlay_previous = share-y


[spacer]
height = 0.05

[hic matrix]
file = hicMatrix/dijhic001_25000_norm_correct.h5
title = 25kb dijhic001 CTRL
# depth is the maximum distance plotted in bp. In Hi-C tracks
# the height of the track is calculated based on the depth such
# that the matrix does not look deformed
depth = 3000000
transform = log1p
file_type = hic_matrix
max_value = 50

[tads]
file =TADs/dijhic001_TADs_25000_domains.bed
file_type = domains
border_color = black
color = none
# the tads are overlay over the hic-matrix
# the share-y options sets the y-axis to be shared
# between the Hi-C matrix and the TADs.
overlay_previous = share-y

[spacer]
height = 0.05
[test arcs overlay]
file = TADs/dijarn085.arc
line_width = 3
height=3


[gtf exons]
file =grch38.refseq.with_genes.gtf
height = 12
title = gtf grch38 (exons and transcripts)
fontsize = 12
file_type = gtf
prefered_name = gene_name

[CNV]
file = dijen080_cnv_bed9.bed
height = 3
title = CNV
color = bed_rgb
labels = false



[bigwig]
file = ENCFF757LRW.bw
color = blue
height = 4
title = CTCF
min_value = 0
max_value = 30

`

  1. it was supposed to be same bin but after checking, yeah it wasn't, now it's working perfectly thanks
  2. i'm experimenting your options i will let you know the results!

TAD_chromo_merge

@MarineBergot
Copy link
Author

edit : CNV worked but scale is not right, apparently the black line is supposed to be my CNV
i tried with other region, do you have any idea how to improve the figure?
TAD_chromo_merge_test_cnv

@lldelisle
Copy link
Collaborator

What do you mean by 'scale is not right'.
In your bed I see:

chr8	643942	649756	ERICH1	0	.	643942	649756	221,20,35

Which is the one you see in your region (chr8:6-11Mb).
If the problem is that you don't see the color, add border_color = bed_rgb .
If you know the CNV does not overlap, you can put:

[CNV]
file = dijen080_cnv_bed9.bed
height = 1
display = collapsed
title = CNV
color = bed_rgb
border_color = bed_rgb
labels = false

@MarineBergot
Copy link
Author

ah it's perfect like that thanks !
TAD_chromo_merge

@MarineBergot
Copy link
Author

H! it's me again :) i'm still working on my plot to add RNA-seq data. I added easily coverage but i would like to add zscore, do you have something like that?
i tried with a bedgraph file but nothing is shown, i can't understand why

my file looks like that :


"chr8"	95133784	95156685	-2.51
"chr8"	9555911	9782346	0.44
"chr8"	96222946	96235545	-0.39
"chr8"	96239401	96261610	-0.97
"chr8"	96261901	96336995	-0.4
"chr8"	96493812	96611790	-1.97
"chr8"	96645241	97143501	0.95
"chr8"	97273487	97277928	0.01
"chr8"	97644183	97730260	-0.25
"chr8"	97775787	97853013	-1.05
"chr8"	97869063	98036724	-0.36
"chr8"	98041720	98045545	-0.02
"chr8"	98064567	98093609	1.24
"chr8"	98102343	98117171	-1.38
"chr8"	98117292	98159835	0.65
"chr8"	98189825	98294235	1.04
"chr8"	98454632	98825688	0.42
"chr8"	98944441	98952100	0.09
"chr8"	99013273	99877580	1.37
"chr8"	99877864	99893707	-1.45

my .ici file (beginning didn't change)

[SNV]
file = dijen080_snv.bed
title = SNV
height = 1
display = collapsed
color = bed_rgb
border_color = bed_rgb


[mutations CTCF]
file = mutations_CTCF.bed
title = mutations CTCF
height = 1
display = collapsed
color = bed_rgb
border_color = bed_rgb



[bedgraph RNA-Seq]
file = dijarn085.bedGraph
title = RNA-seq coverage
color = green
width = 0.2
# optional. Default is yes, set to no to turn off the visualization of data range
show data range = yes
# optional, otherwise guessed from file ending
file_type = bedgraph
height = 4


[bedgraph RNA-Seq zscore]
file = TADs/zscore.bg
title = RNA-seq Zscore
color = red
width = 0.2
# optional. Default is yes, set to no to turn off the visualization of data range
show data range = yes
# optional, otherwise guessed from file ending
file_type = bedgraph
height = 4


[bigwig Chipseq]
file = CTCF_data/ENCFF757LRW.bw
color = blue
height = 4
title = CTCF
min_value = 0
max_value = 30

TAD_chromo_merge_test_CTCF
thanks for your help!
Marine

@lldelisle
Copy link
Collaborator

It is because of the quotes around your chromosome name.

@MarineBergot
Copy link
Author

me again, i have issue with my CNV. Maybe it's not possible but in case of for future. I have different bed file :
-one with CNV
-one with SNV
-one with CNV/SNV which occured in CTCF sequences

this is really not so good when they are in differents lines, i tried to put everything in same bed but it wasn't good solution apparently.
do you think there is a way to draw with differents colors the differns types of CNV/SNV/CTCF mutations directly on the genes ?
thanks!
TAD_chromo_merge_test_CTCF

and maybe improve arc ?

@lldelisle
Copy link
Collaborator

I need more info to help you:

  1. Can you give me your actual ini file? And explain how did you obtain the file plotted as genes grch38?
  2. What are the arcs?
  3. Can you draw and take a picture of what it would look like ideally? For example, you want to see your genes on multiple lines or a single line?

For the overlay you are asking. There is an option: overlay_previous = yes which allow to draw one track on top of the other one, for example, TAD on top of heatmap but it can also be CNV on top of genes... I don't know if this helps you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants