Added differential analysis function #28

nbpeterson3 · 2025-03-28T15:11:24Z

#2 - Differential analysis function that takes ECLIPSE output (GRanges object) and BAM files and utilizes csaw and edgeR to perform sliding window-based read counting of classified super enhancer regions, normalization, filtering based on global background, statistical testing, and merging of windows into consolidated regions. It returns a GRanges object with the merged intervals and combined test statistics.

j-andrews7

I am testing on a real dataset, so may have more comments soon.

j-andrews7 · 2025-04-01T21:22:30Z

R/differentialEnhancers.R

+#'  Default is `150`.
+#' @param window.spacing An integer specifying the distance (bp) between windows.
+#'  Default is `50`.
+#' @param paired A boolean specifying whether reads are paired-end.


Note that this must be specified.

j-andrews7 · 2025-04-01T21:27:09Z

R/differentialEnhancers.R

+
+    # ------------ WINDOWS ------------
+    # concatenate super enhancer regions (super == T)
+    all.SEs <- c(se.1[se.1$super], se.2[se.2$super])


I think it'd be nice to allow this filtering to be turned off and the analysis run for all stitched regions if wanted. The default can be to limit to SEs as you have it here.

j-andrews7 · 2025-04-01T21:52:53Z

R/differentialEnhancers.R

+#'  Default is `50`.
+#' @param paired A boolean specifying whether reads are paired-end.
+#'  Default is `NULL`.
+#' @param read.length An integer specifying the read length. If paired-end, this argument is ignored.


This should be fragment.length instead of read.length.

j-andrews7 · 2025-04-01T21:55:19Z

R/differentialEnhancers.R

+                               window.size = 150, window.spacing = 50,
+                               # optional args to readParam for restricting chromosomes or excluding intervals
+                               # restrict = NULL, discard = NULL,
+                               paired = NULL, read.length = NULL, quality = 20,


Go ahead and set read.length (soon to be fragment.length) to 200 bp by default, as that step is indeed pretty slow, especially if you have a lot of bams.

j-andrews7 · 2025-04-01T21:59:02Z

R/differentialEnhancers.R

+#' differential analysis results including log-fold changes, p-values, FDR, and other statistics.
+#'
+#' @importFrom GenomicRanges slidingWindows reduce mcols
+#' @importFrom csaw readParam correlateReads maximizeCcf regionCounts windowCounts filterWindowsGlobal normOffsets asDGEList


Missing mergeResults here.

j-andrews7 · 2025-04-01T22:54:04Z

Solid, this runs and the results look relatively sensible.

It'd be nice to have a way to prioritize which SEs to look at more closely based on how dramatic the differences are. If you wanted to write a summary function that spits out counts of differential regions on a per-SE basis, that'd be helpful.

Something that returns a GRanges with SE region, ID, num_windows_diff, num_windows_up, num_windows_down.

And then the last 3 columns but expressed as ratios of total windows for region.

Could also just do all of that in this function and return two ranges objects as a named list - one with the windows, one with the consensus SEs with this info appended. That might be cleaner actually. Thoughts?

j-andrews7 · 2025-04-02T16:12:00Z

One other note is there is no way to control the comparison direction, e.g. group1 vs group2 or vice versa in terms of the results returned.

It might be cleaner to be more explicit with how the BAMs and groups are defined. bam.files could be split to group1.bams and group2.bams and then conditions replaced with group1.name and group2.name so that the group1 vs group2 comparison is explicit and the directionality of the output is clear. Right now, you have to look at the underlying data to determine what "up" and "down" mean in the results.

j-andrews7 · 2025-04-03T21:08:39Z

Also, let's stick with underscore delimiters for function names rather than camelCase. Maybe call this one find_differential.

j-andrews7 · 2025-04-04T19:03:10Z

After a bit more use, I think the bg.fc should probably be less aggressive by default, as I am seeing loss of "real" signal regions. I think being somewhat less aggressive with this setting makes sense since we're already feeding in enriched regions.

I will test a few other settings and see what value might be more generally appropriate. This will likely vary from dataset to dataset to some degree.

As an example, with default settings I see instances like this:

Where the bottom are the regions being retained for testing and those clearly "real" signal regions on the right are being filtered.

j-andrews7 · 2025-04-07T21:11:02Z

I also think we don't limit the max width of the combined regions by default.

j-andrews7 · 2025-04-07T22:13:00Z

I am gonna merge this, as I need the functionality to test with other new changes I'm working on.

Some of the easier stuff here (like default params), I will probably just swap, but I'll open issues for other stuff that could use their own PR.

nbpeterson3 · 2025-04-10T13:46:18Z

After a bit more use, I think the bg.fc should probably be less aggressive by default, as I am seeing loss of "real" signal regions. I think being somewhat less aggressive with this setting makes sense since we're already feeding in enriched regions.

I will test a few other settings and see what value might be more generally appropriate. This will likely vary from dataset to dataset to some degree.

As an example, with default settings I see instances like this:

Where the bottom are the regions being retained for testing and those clearly "real" signal regions on the right are being filtered.

From your testing, have you found a more reasonable threshold?

j-andrews7 · 2025-04-10T13:55:36Z

Yes, I have set it to 3, which seems to do okay. This may vary to some extent from experiment to experiment. I think I went ahead and changed the defaults yesterday in the dev branch.

…

On Thu, Apr 10, 2025, 8:46 AM Nick Peterson ***@***.***> wrote: After a bit more use, I think the bg.fc should probably be less aggressive by default, as I am seeing loss of "real" signal regions. I think being somewhat less aggressive with this setting makes sense since we're already feeding in enriched regions. I will test a few other settings and see what value might be more generally appropriate. This will likely vary from dataset to dataset to some degree. As an example, with default settings I see instances like this: [image: image] <https://private-user-images.githubusercontent.com/10225716/430509529-68b82c7b-6232-4b26-bc9e-42af31c72b23.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQxMjMzMTQsIm5iZiI6MTc0NDEyMzAxNCwicGF0aCI6Ii8xMDIyNTcxNi80MzA1MDk1MjktNjhiODJjN2ItNjIzMi00YjI2LWJjOWUtNDJhZjMxYzcyYjIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDA4VDE0MzY1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc3MjU3YWZjMjIzZDgxYjA3NzJiYjU0YTBmNTgyNDc3NDk2NGM5ODkwNGVkODIzMDU4MDhiZDQwNmY1Mzg2YjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.PUl5o5qrEM9B4zXWij-kxRH6w5u_ooNGyUjZKqsIHAw> Where the bottom are the regions being retained for testing and those clearly "real" signal regions on the right are being filtered. From your testing, have you found a more reasonable threshold? — Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOAQNCWSUF6CYWNMUXWEEL2YZY4BAVCNFSM6AAAAABZ7Y2WRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOJTGE4TSMRYGE> . You are receiving this because you modified the open/close state.Message ID: ***@***.***> *nbpeterson3* left a comment (j-andrews7/ECLIPSE#28) <#28 (comment)> After a bit more use, I think the bg.fc should probably be less aggressive by default, as I am seeing loss of "real" signal regions. I think being somewhat less aggressive with this setting makes sense since we're already feeding in enriched regions. I will test a few other settings and see what value might be more generally appropriate. This will likely vary from dataset to dataset to some degree. As an example, with default settings I see instances like this: [image: image] <https://private-user-images.githubusercontent.com/10225716/430509529-68b82c7b-6232-4b26-bc9e-42af31c72b23.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQxMjMzMTQsIm5iZiI6MTc0NDEyMzAxNCwicGF0aCI6Ii8xMDIyNTcxNi80MzA1MDk1MjktNjhiODJjN2ItNjIzMi00YjI2LWJjOWUtNDJhZjMxYzcyYjIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDA4VDE0MzY1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc3MjU3YWZjMjIzZDgxYjA3NzJiYjU0YTBmNTgyNDc3NDk2NGM5ODkwNGVkODIzMDU4MDhiZDQwNmY1Mzg2YjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.PUl5o5qrEM9B4zXWij-kxRH6w5u_ooNGyUjZKqsIHAw> Where the bottom are the regions being retained for testing and those clearly "real" signal regions on the right are being filtered. From your testing, have you found a more reasonable threshold? — Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOAQNCWSUF6CYWNMUXWEEL2YZY4BAVCNFSM6AAAAABZ7Y2WRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOJTGE4TSMRYGE> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

nbpeterson3 added 3 commits March 27, 2025 16:33

add differential analysis function

8679f57

Fix for FDR reporting and console message

ab15001

clarified readParam comment

8369a55

j-andrews7 requested changes Apr 1, 2025

View reviewed changes

j-andrews7 merged commit 2cf1bcd into j-andrews7:dev Apr 7, 2025

j-andrews7 mentioned this pull request Apr 23, 2025

Add differential analysis functions #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added differential analysis function #28

Added differential analysis function #28

Uh oh!

nbpeterson3 commented Mar 28, 2025

Uh oh!

j-andrews7 left a comment

Uh oh!

j-andrews7 Apr 1, 2025

Uh oh!

j-andrews7 Apr 1, 2025

Uh oh!

j-andrews7 Apr 1, 2025

Uh oh!

j-andrews7 Apr 1, 2025

Uh oh!

j-andrews7 Apr 1, 2025

Uh oh!

j-andrews7 commented Apr 1, 2025 •

edited

Loading

Uh oh!

j-andrews7 commented Apr 2, 2025

Uh oh!

j-andrews7 commented Apr 3, 2025

Uh oh!

j-andrews7 commented Apr 4, 2025

Uh oh!

j-andrews7 commented Apr 7, 2025

Uh oh!

j-andrews7 commented Apr 7, 2025

Uh oh!

nbpeterson3 commented Apr 10, 2025

Uh oh!

j-andrews7 commented Apr 10, 2025 via email

Uh oh!

Uh oh!

Added differential analysis function #28

Added differential analysis function #28

Uh oh!

Conversation

nbpeterson3 commented Mar 28, 2025

Uh oh!

j-andrews7 left a comment

Choose a reason for hiding this comment

Uh oh!

j-andrews7 Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

j-andrews7 Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

j-andrews7 Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

j-andrews7 Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

j-andrews7 Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

j-andrews7 commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j-andrews7 commented Apr 2, 2025

Uh oh!

j-andrews7 commented Apr 3, 2025

Uh oh!

j-andrews7 commented Apr 4, 2025

Uh oh!

j-andrews7 commented Apr 7, 2025

Uh oh!

j-andrews7 commented Apr 7, 2025

Uh oh!

nbpeterson3 commented Apr 10, 2025

Uh oh!

j-andrews7 commented Apr 10, 2025 via email

Uh oh!

Uh oh!

j-andrews7 commented Apr 1, 2025 •

edited

Loading