Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between EnhancerPredictionsFull_threshold0.02_self_promoter.tsv and EnhancerPredictions_threshold0.02_self_promoter.tsv #223

Open
NicoleYY77 opened this issue May 10, 2024 · 3 comments

Comments

@NicoleYY77
Copy link

Hi, I have some issues with interpreting the output files of prediction: there are EnhancerPredictionsAllPutative.tsv, EnhancerPredictionsFull_threshold0.02_self_promoter.tsv, and EnhancerPredictions_threshold0.02_self_promoter.tsv. I initially thought EnhancerPredictionsFull_threshold0.02_self_promoter.tsv is generated by selecting subset of enhancer-genes pairs with ABC.score > 0.02 from EnhancerPredictionsAllPutative.tsv, but the count doesn't match;
I'm also confused about the relationship between EnhancerPredictionsFull_threshold0.02_self_promoter.tsv, and EnhancerPredictions_threshold0.02_self_promoter.tsv, they have the same number of rows so I initially thought EnhancerPredictions_threshold0.02_self_promoter.tsv is generated by selecting some core columns from EnhancerPredictionsFull_threshold0.02_self_promoter.tsv, but I found that lots of ABC.score in EnhancerPredictionsFull_threshold0.02_self_promoter.tsv are less than 0.02. I'm not sure if I misinterpret these files, really appreciated if you could help me interpret these output files!

@atancoder
Copy link
Collaborator

EnhancerPredictionsFull_threshold0.02_self_promoter.tsv is generated by selecting subset of enhancer-genes pairs with ABC.score > 0.02 from EnhancerPredictionsAllPutative.tsv

This should be true: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/blob/main/workflow/scripts/filter_predictions.py#L50. Can you show your results with ABC scores < .02?

@NicoleYY77
Copy link
Author

Thank you for the reply!
Based on this script, seem that we are selecting the subset that ABC.score > 0.02 and also not belong to promoter class except it's self-promoter?
My EnhancerPredictions_threshold0.02_self_promoter.tsv is like this and it has 105,037 rows:
chr start end name TargetGene TargetGeneTSS CellType ABC.Score
chr1 713881 714381 intergenic|chr1:713881-714381 ATAD3A 1447522 K562_hg19_0501 0.050366
chr1 713881 714381 intergenic|chr1:713881-714381 RNF223 1009687 K562_hg19_0501 0.021288
chr1 713881 714381 intergenic|chr1:713881-714381 PERM1 917497 K562_hg19_0501 0.032612
chr1 713881 714381 intergenic|chr1:713881-714381 PLEKHN1 901876 K562_hg19_0501 0.038985
chr1 713881 714381 intergenic|chr1:713881-714381 KLHL17 895966 K562_hg19_0501 0.038383
chr1 713881 714381 intergenic|chr1:713881-714381 SAMD11 861120 K562_hg19_0501 0.064575
chr1 713881 714381 intergenic|chr1:713881-714381 FAM87B 752750 K562_hg19_0501 0.117851
chr1 752446 753000 promoter|chr1:752446-753000 FAM87B 752750 K562_hg19_0501 1.000000
chr1 762648 763363 promoter|chr1:762648-763363 LINC01128 762970 K562_hg19_0501 1.000000
chr1 762648 763363 promoter|chr1:762648-763363 LINC00115 762902 K562_hg19_0501 1.000000

But if I run awk '$25 > 0.02 {print$0}' EnhancerPredictionsAllPutative.tsv|wc -l, I got 116,067
if I run awk '($5!="promoter"||$17=="True")&&$25> 0.02 {print$0}' EnhancerPredictionsAllPutati ve.tsv|wc -l, I got 69,537

@atancoder
Copy link
Collaborator

Yes, the filtered file also removes self promoters. Take a look at that script as it outlines exactly how it goes from the all putative file to the thresholded file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants