-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates and fixes #709
Updates and fixes #709
Conversation
- Enable support for subsetting `XcmsExperiment` objects with negative indices (issue #707).
- Remove the not-exported normalization functions. These have been moved to the MetaboCoreUtils package.
- Improve the performance to extract EICs (along with chromatographic peaks) from an `XcmsExperiment` object.
object, rt = fd[i, rtc], mz = fd[i, mzc], | ||
msLevel = chrs[i, 1]@msLevel, type = chromPeaks) | ||
f_s <- factor(.chromPeaks(object)[idx, "sample"], levels = js) | ||
pkl <- split.data.frame(.chromPeaks(object)[idx, , drop = FALSE], f_s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
splitting the chrom peaks matrix
and chrom peak data data.frame
considerably improved performance. Also, we try to use data.frame
instead of DataFrame
for chromPeakData
as much as possible: subset of the former is much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask the reason of why the DataFrame
was initially used ? was it easier at the beginning ? I have not really end up using DataFrame
until now so I'm a bit curious as to when someone decide to prefer this over data.frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. There were 2 reasons: firstly, the DataFrame
was used in e.g. MSnbase
and other Bioconductor packages (e.g. SummarizedExperiment
) and I wanted to be consistent with that, then, secondly, the DataFrame
would also allow to store S4 objects in columns (which the data.frame
would officially not). Not that we planned to store now S4 objects in DataFrame
s - so it was mostly reason 1 why we used DataFrame
.
Only (quite some time) later I figured out that subset or extract data from a DataFrame
s is quite slower than for data.frame
. Thus, recently, I prefer to use, whenever possible (internally) a data.frame
.
for (i in seq_len(nrow(chrs))) { | ||
pks <- chromPeaks(object, rt = fd[i, rtc], mz = fd[i, mzc], | ||
msLevel = chrs[i, 1]@msLevel, type = chromPeaks) | ||
idx <- .index_chrom_peaks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of getting the filtered chromPeaks
matrix we simply get the indices of the rows matching the filter. That allows to more efficiently subset also the chromPeakData
.
#' requested filtering. | ||
#' | ||
#' @noRd | ||
.index_chrom_peaks <- function(object, rt = numeric(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is then also re-used in chromPeaks,XcmsExperiment
and chromPeaks,XCMSnExp
.
#' a `XcmsExperiment` object. | ||
#' | ||
#' @noRd | ||
.chromPeakData <- function(object, msLevel = integer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little helper functions to avoid unnecessary conversion of data.frame
to DataFrame
for internal calls.
#' filtering from either a `XcmsExperiment` or `XCMSnExp` object | ||
#' | ||
#' @noRd | ||
.chromPeaks <- function(object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function was required, because we would end up in (infinitely) recursive call with the chromPeaks
method: chromPeaks
uses .index_chrom_peaks
, which in turn calls chromPeaks
(that again calls .index_chrom_peaks
, ...). To extract the full chrom peaks matrix we can use .chromPeaks
, to get eventually filtered chrom peaks we use chromPeaks
(that's what the user will use).
#' but there could be room for improvement. | ||
#' from an XcmsExperiment. | ||
#' | ||
#' @param object `XcmsExperiment` or `XCMSnExp` object. | ||
#' | ||
#' @noRd | ||
.xmse_extract_chromatograms_old <- function(object, rt, mz, aggregationFun, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function was modified to improve performance of the chromatogram,XcmsExperiment
call.
Note: unit tests fail on ubuntu (and the Bioconductor docker image) because of internal changes in R devel. Because of this Rcpp, rlang and related packages fail to install (and hence also mzR, xcms etc). |
@sneumann @philouail , just a friendly reminder to eventually (if you find the time) have a look at this PR :) |
Hi, any need to bump dependency versions if code was moved to |
Nope, no need for any version bump - this internal code that was refactored and moved to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if no changes / version bumps in DESCCRIPTION are needed, this looks good to me. Yours, Steffen
pks_empty <- chromPeaks(object)[integer(), ] | ||
pkd_empty <- chromPeakData(object)[integer(), ] | ||
pks_empty <- .chromPeaks(object)[integer(), ] | ||
pkd_empty <- as(.chromPeakData(object)[integer(), ], "DataFrame") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happening here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created new internal functions .chromPeaks
and .chromPeakData
to extract the content of the chromPeaks
(or chromPeakData
) slot from an xcms result object. We have two xcms result objects now, XCMSnExp
and XcmsExperiment
. The former stores the data in a quite clumsy way, and extracting the data is a little more tricky. The latter is the new preferred result object. There is also the chromPeaks
and chromPeakData
method available for each of the two result objects, but these methods do quite some additional checks and allow subset/filter the content. The new internal functions (are not expected to be called by the user) are simple helpers to extract the full chromPeaks and chromPeakData content without any filtering etc.
I could also have created a method for that, but I decided against that and just added a function with an if/else inside.
Then, for chromPeakData I have also to ensure that the result is returned as a DataFrame
, thus I'm calling in addition the as(..., "DataFrame"
. For XCMSnExp
.chromPeakData
will already return a DataFrame
, so nothing will happen, while for XcmsExperiment
I'm storing the data as a data.frame
, thus I need to convert first.
Hope this explained it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good for me, just have 2 questions in the comments ! also thanks for your comments explaining it helped me to understand the context of everything.
chromatogram
call forXcmsExperiment
objects.XcmsExperiment
with negative indices (negative subsetting removes chromPeaks #707).