-
Notifications
You must be signed in to change notification settings - Fork 1
Added differential analysis function #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am testing on a real dataset, so may have more comments soon.
#' Default is `150`. | ||
#' @param window.spacing An integer specifying the distance (bp) between windows. | ||
#' Default is `50`. | ||
#' @param paired A boolean specifying whether reads are paired-end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this must be specified.
|
||
# ------------ WINDOWS ------------ | ||
# concatenate super enhancer regions (super == T) | ||
all.SEs <- c(se.1[se.1$super], se.2[se.2$super]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nice to allow this filtering to be turned off and the analysis run for all stitched regions if wanted. The default can be to limit to SEs as you have it here.
#' Default is `50`. | ||
#' @param paired A boolean specifying whether reads are paired-end. | ||
#' Default is `NULL`. | ||
#' @param read.length An integer specifying the read length. If paired-end, this argument is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fragment.length
instead of read.length
.
window.size = 150, window.spacing = 50, | ||
# optional args to readParam for restricting chromosomes or excluding intervals | ||
# restrict = NULL, discard = NULL, | ||
paired = NULL, read.length = NULL, quality = 20, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead and set read.length
(soon to be fragment.length
) to 200 bp by default, as that step is indeed pretty slow, especially if you have a lot of bams.
#' differential analysis results including log-fold changes, p-values, FDR, and other statistics. | ||
#' | ||
#' @importFrom GenomicRanges slidingWindows reduce mcols | ||
#' @importFrom csaw readParam correlateReads maximizeCcf regionCounts windowCounts filterWindowsGlobal normOffsets asDGEList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing mergeResults
here.
Solid, this runs and the results look relatively sensible. It'd be nice to have a way to prioritize which SEs to look at more closely based on how dramatic the differences are. If you wanted to write a summary function that spits out counts of differential regions on a per-SE basis, that'd be helpful. Something that returns a GRanges with SE region, ID, num_windows_diff, num_windows_up, num_windows_down. And then the last 3 columns but expressed as ratios of total windows for region. Could also just do all of that in this function and return two ranges objects as a named list - one with the windows, one with the consensus SEs with this info appended. That might be cleaner actually. Thoughts? |
One other note is there is no way to control the comparison direction, e.g. group1 vs group2 or vice versa in terms of the results returned. It might be cleaner to be more explicit with how the BAMs and groups are defined. |
Also, let's stick with underscore delimiters for function names rather than camelCase. Maybe call this one |
I also think we don't limit the max width of the combined regions by default. |
I am gonna merge this, as I need the functionality to test with other new changes I'm working on. Some of the easier stuff here (like default params), I will probably just swap, but I'll open issues for other stuff that could use their own PR. |
Yes, I have set it to 3, which seems to do okay. This may vary to some
extent from experiment to experiment. I think I went ahead and changed the
defaults yesterday in the dev branch.
…On Thu, Apr 10, 2025, 8:46 AM Nick Peterson ***@***.***> wrote:
After a bit more use, I think the bg.fc should probably be less
aggressive by default, as I am seeing loss of "real" signal regions. I
think being somewhat less aggressive with this setting makes sense since
we're already feeding in enriched regions.
I will test a few other settings and see what value might be more
generally appropriate. This will likely vary from dataset to dataset to
some degree.
As an example, with default settings I see instances like this: [image:
image]
<https://private-user-images.githubusercontent.com/10225716/430509529-68b82c7b-6232-4b26-bc9e-42af31c72b23.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQxMjMzMTQsIm5iZiI6MTc0NDEyMzAxNCwicGF0aCI6Ii8xMDIyNTcxNi80MzA1MDk1MjktNjhiODJjN2ItNjIzMi00YjI2LWJjOWUtNDJhZjMxYzcyYjIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDA4VDE0MzY1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc3MjU3YWZjMjIzZDgxYjA3NzJiYjU0YTBmNTgyNDc3NDk2NGM5ODkwNGVkODIzMDU4MDhiZDQwNmY1Mzg2YjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.PUl5o5qrEM9B4zXWij-kxRH6w5u_ooNGyUjZKqsIHAw>
Where the bottom are the regions being retained for testing and those
clearly "real" signal regions on the right are being filtered.
From your testing, have you found a more reasonable threshold?
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOAQNCWSUF6CYWNMUXWEEL2YZY4BAVCNFSM6AAAAABZ7Y2WRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOJTGE4TSMRYGE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
*nbpeterson3* left a comment (j-andrews7/ECLIPSE#28)
<#28 (comment)>
After a bit more use, I think the bg.fc should probably be less
aggressive by default, as I am seeing loss of "real" signal regions. I
think being somewhat less aggressive with this setting makes sense since
we're already feeding in enriched regions.
I will test a few other settings and see what value might be more
generally appropriate. This will likely vary from dataset to dataset to
some degree.
As an example, with default settings I see instances like this: [image:
image]
<https://private-user-images.githubusercontent.com/10225716/430509529-68b82c7b-6232-4b26-bc9e-42af31c72b23.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQxMjMzMTQsIm5iZiI6MTc0NDEyMzAxNCwicGF0aCI6Ii8xMDIyNTcxNi80MzA1MDk1MjktNjhiODJjN2ItNjIzMi00YjI2LWJjOWUtNDJhZjMxYzcyYjIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDA4VDE0MzY1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc3MjU3YWZjMjIzZDgxYjA3NzJiYjU0YTBmNTgyNDc3NDk2NGM5ODkwNGVkODIzMDU4MDhiZDQwNmY1Mzg2YjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.PUl5o5qrEM9B4zXWij-kxRH6w5u_ooNGyUjZKqsIHAw>
Where the bottom are the regions being retained for testing and those
clearly "real" signal regions on the right are being filtered.
From your testing, have you found a more reasonable threshold?
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOAQNCWSUF6CYWNMUXWEEL2YZY4BAVCNFSM6AAAAABZ7Y2WRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOJTGE4TSMRYGE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
#2 - Differential analysis function that takes ECLIPSE output (
GRanges
object) and BAM files and utilizescsaw
andedgeR
to perform sliding window-based read counting of classified super enhancer regions, normalization, filtering based on global background, statistical testing, and merging of windows into consolidated regions. It returns aGRanges
object with the merged intervals and combined test statistics.