Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates and fixes #709

Merged
merged 3 commits into from
Jan 15, 2024
Merged

Updates and fixes #709

merged 3 commits into from
Jan 15, 2024

Conversation

jorainer
Copy link
Collaborator

- Enable support for subsetting `XcmsExperiment` objects with negative
  indices (issue #707).
- Remove the not-exported normalization functions. These have been moved to the
  MetaboCoreUtils package.
- Improve the performance to extract EICs (along with chromatographic peaks)
  from an `XcmsExperiment` object.
object, rt = fd[i, rtc], mz = fd[i, mzc],
msLevel = chrs[i, 1]@msLevel, type = chromPeaks)
f_s <- factor(.chromPeaks(object)[idx, "sample"], levels = js)
pkl <- split.data.frame(.chromPeaks(object)[idx, , drop = FALSE], f_s)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitting the chrom peaks matrix and chrom peak data data.frame considerably improved performance. Also, we try to use data.frame instead of DataFrame for chromPeakData as much as possible: subset of the former is much faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask the reason of why the DataFrame was initially used ? was it easier at the beginning ? I have not really end up using DataFrame until now so I'm a bit curious as to when someone decide to prefer this over data.frame

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There were 2 reasons: firstly, the DataFrame was used in e.g. MSnbase and other Bioconductor packages (e.g. SummarizedExperiment) and I wanted to be consistent with that, then, secondly, the DataFrame would also allow to store S4 objects in columns (which the data.frame would officially not). Not that we planned to store now S4 objects in DataFrames - so it was mostly reason 1 why we used DataFrame.

Only (quite some time) later I figured out that subset or extract data from a DataFrames is quite slower than for data.frame. Thus, recently, I prefer to use, whenever possible (internally) a data.frame.

for (i in seq_len(nrow(chrs))) {
pks <- chromPeaks(object, rt = fd[i, rtc], mz = fd[i, mzc],
msLevel = chrs[i, 1]@msLevel, type = chromPeaks)
idx <- .index_chrom_peaks(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of getting the filtered chromPeaks matrix we simply get the indices of the rows matching the filter. That allows to more efficiently subset also the chromPeakData.

#' requested filtering.
#'
#' @noRd
.index_chrom_peaks <- function(object, rt = numeric(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is then also re-used in chromPeaks,XcmsExperiment and chromPeaks,XCMSnExp.

#' a `XcmsExperiment` object.
#'
#' @noRd
.chromPeakData <- function(object, msLevel = integer()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little helper functions to avoid unnecessary conversion of data.frame to DataFrame for internal calls.

#' filtering from either a `XcmsExperiment` or `XCMSnExp` object
#'
#' @noRd
.chromPeaks <- function(object) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function was required, because we would end up in (infinitely) recursive call with the chromPeaks method: chromPeaks uses .index_chrom_peaks, which in turn calls chromPeaks (that again calls .index_chrom_peaks, ...). To extract the full chrom peaks matrix we can use .chromPeaks, to get eventually filtered chrom peaks we use chromPeaks (that's what the user will use).

#' but there could be room for improvement.
#' from an XcmsExperiment.
#'
#' @param object `XcmsExperiment` or `XCMSnExp` object.
#'
#' @noRd
.xmse_extract_chromatograms_old <- function(object, rt, mz, aggregationFun,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function was modified to improve performance of the chromatogram,XcmsExperiment call.

@jorainer
Copy link
Collaborator Author

Note: unit tests fail on ubuntu (and the Bioconductor docker image) because of internal changes in R devel. Because of this Rcpp, rlang and related packages fail to install (and hence also mzR, xcms etc).

@jorainer jorainer marked this pull request as ready for review December 20, 2023 09:46
@jorainer
Copy link
Collaborator Author

@sneumann @philouail , just a friendly reminder to eventually (if you find the time) have a look at this PR :)

@sneumann
Copy link
Owner

Hi, any need to bump dependency versions if code was moved to MetaboCoreUtils ? Otherwise looks good to me. Yours, Steffen

@jorainer
Copy link
Collaborator Author

Nope, no need for any version bump - this internal code that was refactored and moved to MetaboCoreUtils was never used within xcms (in any exported function).

Copy link
Owner

@sneumann sneumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if no changes / version bumps in DESCCRIPTION are needed, this looks good to me. Yours, Steffen

pks_empty <- chromPeaks(object)[integer(), ]
pkd_empty <- chromPeakData(object)[integer(), ]
pks_empty <- .chromPeaks(object)[integer(), ]
pkd_empty <- as(.chromPeakData(object)[integer(), ], "DataFrame")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created new internal functions .chromPeaks and .chromPeakData to extract the content of the chromPeaks (or chromPeakData) slot from an xcms result object. We have two xcms result objects now, XCMSnExp and XcmsExperiment. The former stores the data in a quite clumsy way, and extracting the data is a little more tricky. The latter is the new preferred result object. There is also the chromPeaks and chromPeakData method available for each of the two result objects, but these methods do quite some additional checks and allow subset/filter the content. The new internal functions (are not expected to be called by the user) are simple helpers to extract the full chromPeaks and chromPeakData content without any filtering etc.

I could also have created a method for that, but I decided against that and just added a function with an if/else inside.

Then, for chromPeakData I have also to ensure that the result is returned as a DataFrame, thus I'm calling in addition the as(..., "DataFrame". For XCMSnExp .chromPeakData will already return a DataFrame, so nothing will happen, while for XcmsExperiment I'm storing the data as a data.frame, thus I need to convert first.

Hope this explained it?

Copy link
Collaborator

@philouail philouail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good for me, just have 2 questions in the comments ! also thanks for your comments explaining it helped me to understand the context of everything.

@sneumann sneumann merged commit 548c248 into devel Jan 15, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants