Skip to content

Commit

Permalink
fix: add examples for calculateMass with isotopes (issue #81)
Browse files Browse the repository at this point in the history
  • Loading branch information
jorainer committed Apr 12, 2024
1 parent bb3d08a commit 5610952
Show file tree
Hide file tree
Showing 6 changed files with 85 additions and 45 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: MetaboCoreUtils
Title: Core Utils for Metabolomics Data
Version: 1.11.2
Version: 1.11.3
Description: MetaboCoreUtils defines metabolomics-related core functionality
provided as low-level functions to allow a data structure-independent usage
across various R packages. This includes functions to calculate between ion
Expand Down Expand Up @@ -61,4 +61,4 @@ BugReports: https://github.com/RforMassSpectrometry/MetaboCoreUtils/issues
URL: https://github.com/RforMassSpectrometry/MetaboCoreUtils
biocViews: Infrastructure, Metabolomics, MassSpectrometry
Roxygen: list(markdown=TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# MetaboCoreUtils 1.11

## MetaboCoreUtils 1.11.3

- Add examples on isotopes (including deuterium) can be used with
`calculateMass` (issue
[#81](https://github.com/rformassspectrometry/MetaboCoreUtils/issues/81))

## MetaboCoreUtils 1.11.2

- Add functions to compute quality check of the data (issue
Expand Down
12 changes: 11 additions & 1 deletion R/chemFormula.R
Original file line number Diff line number Diff line change
Expand Up @@ -316,10 +316,15 @@ multiplyElements <- function(x, k) {
#'
#' @description
#'
#' `calculateMass` calculates the exact mass from a formula.
#' `calculateMass` calculates the exact mass from a formula. Isotopes are also
#' supported. For isotopes, the isotope type needs to be specified as an
#' element's prefix, e.g. `"[13C]"` for carbon 13 or `"[2H]"` for deuterium.
#' A formula with 2 carbon 13 isotopes and 3 carbons would thus contain e.g.
#' `"[13C2]C3"`.
#'
#' @param x `character` representing chemical formula(s) or a `list ` of
#' `numeric` with element counts such as returned by [countElements()].
#' Isotopes and deuterated elements are supported (see examples below).
#'
#' @return `numeric` Resulting exact mass.
#'
Expand All @@ -332,7 +337,12 @@ multiplyElements <- function(x, k) {
#' calculateMass("C6H12O6")
#' calculateMass("NH3")
#' calculateMass(c("C6H12O6", "NH3"))
#'
#' ## Calculate masses for formulas containing isotope information.
#' calculateMass(c("C6H12O6", "[13C3]C3H12O6"))
#'
#' ## Calculate mass for a chemical with 5 deuterium.
#' calculateMass("C11[2H5]H7N2O2")
calculateMass <- function(x) {
if (is.character(x))
x <- countElements(x)
Expand Down
14 changes: 12 additions & 2 deletions man/calculateMass.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/quality_assessment.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

86 changes: 50 additions & 36 deletions vignettes/MetaboCoreUtils.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,14 @@ BiocStyle::markdown()

# Introduction

The `MetaboCoreUtils` defines metabolomics-related core functionality provided
as low-level functions to allow a data structure-independent usage across
various R packages [@rainer_modular_2022]. This includes functions to calculate between ion (adduct)
and compound mass-to-charge ratios and masses or functions to work with chemical
formulas. The package provides also a set of adduct definitions and information
on some commercially available internal standard mixes commonly used in MS
experiments.
The `r Biocpkg("MetaboCoreUtils")` package defines metabolomics-related core
functionality provided as low-level functions to allow a data
structure-independent usage across various R packages
[@rainer_modular_2022]. This includes functions to calculate between ion
(adduct) and compound mass-to-charge ratios and masses or functions to work with
chemical formulas. The package provides also a set of adduct definitions and
information on some commercially available internal standard mixes commonly used
in MS experiments.

For a full list of function, see

Expand Down Expand Up @@ -66,16 +67,16 @@ library(MetaboCoreUtils)

## Conversion between ion m/z and compound masses

The `mass2mz` and `mz2mass` functions allow to convert between compound masses
and ion (adduct) mass-to-charge ratios (m/z). The `MetaboCoreUtils` package
provides definitions of common ion adducts generated by electrospray ionization
(ESI). These can be listed with the `adductNames` function.
The `mass2mz()` and `mz2mass()` functions allow to convert between compound
masses and ion (adduct) mass-to-charge ratios (m/z). The *MetaboCoreUtils*
package provides definitions of common ion adducts generated by electrospray
ionization (ESI). These can be listed with the `adductNames()` function.

```{r}
adductNames()
```

With that we can use the `mass2mz` function to calculate the m/z for a set of
With that we can use the `mass2mz()` function to calculate the m/z for a set of
compounds assuming the generation of certain ions. In the example below we
define masses for some theoretical compounds and calculate their expected m/z
assuming that ions `"[M+H]+"` and `"[M+Na]+"` are generated.
Expand All @@ -86,11 +87,11 @@ mass2mz(masses, adduct = c("[M+H]+", "[M+Na]+"))
```

As a result we get a `matrix` with each row representing one compound and each
column the m/z for one of the defined adducts. With the `mz2mass` we could
perform the reverse calculation, i.e. from m/z to compound masses.
column the m/z for one of the defined adducts. With the `mz2mass()` function we
could perform the reverse calculation, i.e. from m/z to compound masses.

In addition, it is possible to calculate m/z values from chemical formulas with
the `formula2mz` function. Below we calculate the m/z values for `[M+H]+` and
the `formula2mz()` function. Below we calculate the m/z values for `[M+H]+` and
`[M+Na]+` adducts from the chemical formulas of glucose and caffeine.

```{r}
Expand All @@ -102,53 +103,66 @@ formula2mz(c("C6H12O6", "C8H10N4O2"), adduct = c("[M+H]+", "[M+Na]+"))

The lack of consistency in the format in which chemical formulas are written
poses a big problem comparing formulas coming from different resources. The
`MetaboCoreUtils` package provides functions to *standardize* formulas as well
*MetaboCoreUtils* package provides functions to *standardize* formulas as well
as combine formulas or substract elements from formulas. Below we use an
artificial example to show this functionality. First we standardize a chemical
formula with the `standardizeFormula` function.
formula with the `standardizeFormula()` function.

```{r}
frml <- "Na3C4"
frml <- standardizeFormula(frml)
frml
```

Next we add `"H2O"` to the formula using the `addElements` function.
Next we add `"H2O"` to the formula using the `addElements()` function.

```{r}
frml <- addElements(frml, "H2O")
frml
```

We can also substract elements with the `subtractElements` function:
We can also substract elements with the `subtractElements()` function:

```{r}
frml <- subtractElements(frml, "H")
frml
```

Chemical formulas could also be multiplied with a scalar using the
`multiplyElements` function. The counts for individual elements in a chemical
formula can be calculated with the `countElements` function.
`multiplyElements()` function. The counts for individual elements in a chemical
formula can be calculated with the `countElements()` function.

```{r}
countElements(frml)
```

The function `adductFormula` allows in addition to create chemical formulas of
The function `adductFormula()` allows in addition to create chemical formulas of
specific adducts of compounds. Below we create chemical formulas for `[M+H]+`
and `[M+Na]+` adducts for glucose and caffeine.

```{r}
adductFormula(c("C6H12O6", "C8H10N4O2"), adduct = c("[M+H]+", "[M+Na]+"))
```

Finally, `calculateMass()` can be used to calculate the (exact) mass for a given
chemical formula. This function supports also the definition of isotopes in the
formula. As an example we calculate below the mass of two chemical formulas,
one without isotopes and one with 3 of the carbon atoms replaced by the carbon
13 isotope.

```{r}
calculateMass(c("C6H12O6", "[13C3]C3H12O6"))
```

Note that isotopes are supported for all elements (deuterium could for example
be expressed as `"[2H]"`).


## Kendrick mass defect calculation

Lipids and other homologous series based on fatty acyls can be found in data by
using Kendrick mass defects (KMD) or referenced kendrick mass defects
(RKMD). The `MetaboCoreUtils` package provides functions to calculate everything
(RKMD). The *MetaboCoreUtils* package provides functions to calculate everything
around Kendrick mass defects. The following example calculates the KMD and RKMD
for three lipids (PC(16:0/18:1(9Z)), PC(16:0/18:0), PS(16:0/18:1(9Z))) and
checks, if they fit the RKMD of PCs detected as [M+H]+ adducts.
Expand All @@ -172,7 +186,7 @@ isRkmd(lipid_rkmd)

Retention times are often not directly comparable between two LC-MS systems,
even if nominally the same separation method is used. Conversion of retention
times to retetion indices can overcome this issue. The `MetaboCoreUtils` package
times to retetion indices can overcome this issue. The *MetaboCoreUtils* package
provides a function to perform this conversion. Below we use an example based on
indexing with a homologoues series af N-Alkyl-pyridinium sulfonates (NAPS).

Expand All @@ -198,7 +212,7 @@ head(rti)
```


The indexing is peformed using the function `indexRtime`.
The indexing is peformed using the function `indexRtime()`.

```{r}
rtime$rindex_r <- indexRtime(rtime$rtime, rti)
Expand All @@ -212,7 +226,7 @@ head(rtime)

Conditions that shall be compared by the retention index might not perfectly
match. In case the deviation is linear a simple two-point correction can be
applied to the data. This is performed by the function `correctRindex`. The
applied to the data. This is performed by the function `correctRindex()`. The
correction requires two reference standards and their measured RIs and reference
RIs.

Expand All @@ -231,12 +245,12 @@ affected by technical noise or signal drifts. In particular, some of these
technical variances can be specific for individual metabolites, requiring hence
a per-feature adjustment of the abundances. One example of such noise is an
injection order dependent signal drift that can sometimes be observed in
untargeted metabolomics data from LC-MS experiments. The `fit_lm` function can
untargeted metabolomics data from LC-MS experiments. The `fit_lm()` function can
be used to model such drifts in the observed data of each single feature, for
example with a model of the form `y ~ injection_index` that models the
relationship between the measured abundances of a metabolite `y` on the index in
which the respective sample was injected (`injection_index`). Subsequently, the
data can be adjusted for the modeled drift with the `adjust_lm` function. This
data can be adjusted for the modeled drift with the `adjust_lm()` function. This
approach is similar to the one described by [@wehrens_improved_2016].

Below we perform such an injection order dependent signal adjustment on a small
Expand Down Expand Up @@ -443,14 +457,14 @@ qc_lm[qc_lm_summary[, "p.value"] > 0.05] <- NA
```

We can next adjust the data for the estimated signal drifts using the
`adjust_lm` function. We will thus adjust abundances in all samples (including
`adjust_lm()` function. We will thus adjust abundances in all samples (including
the study samples) using the linear models estimated on the QC samples. For
features for which no linear model is provided (i.e., with an `NA` in the `list`
of linear models) the original abundances will be returned *as is*. With
parameter `data` we need to provide a `data.frame` with all required covariates
for the fitted models (i.e., defined by the `formula` passed to the `fit_lm`
for the fitted models (i.e., defined by the `formula` passed to the `fit_lm()`
call). Also, since we fitted the models to the data in `log2` scale, we need
also to provide log2 transformed values to the `adjust_lm` function.
also to provide log2 transformed values to the `adjust_lm()` function.

```{r}
#' Adjust the data for the estimated signal drift
Expand Down Expand Up @@ -532,9 +546,9 @@ Adjustment, while not completely removing it for all features, globally reduced
the dependency of abundances on the injection index.

Summarizing, feature-wise biases in LC-MS data can be estimated, and adjusted
for using the `fit_lm` and `adjust_lm` functions. Ideally, such biases should be
estimated on (repeatedly measured) QC samples, with the QC samples being
representative of the study samples (e.g. a pool of all study samples). In
for using the `fit_lm()` and `adjust_lm()` functions. Ideally, such biases
should be estimated on (repeatedly measured) QC samples, with the QC samples
being representative of the study samples (e.g. a pool of all study samples). In
addition, due to the generally relatively low number of available data points,
the estimation of the signal drift can be unreliable and it is thus strongly
suggested to evaluate or visually inspect some of them to derive strategies
Expand All @@ -555,7 +569,7 @@ represents a feature. In this tutorial, we'll explore a set of functions
designed designed to calculate basic quality assessment metrics on which
metabolomics data can subsequently be filtered.

First, to get more information on the available function you can check the documentation
First, to get more information on the available function you can check the documentation

```{r}
?quality_assessment
Expand Down Expand Up @@ -624,7 +638,7 @@ print(blank_detection_result)

All of these computations can then be used to easily filter our data and remove
the features that do not fit our quality criteria. Below we remove all features
that have a D-ratio and coefficeint of variation < 0.8 with no missing values
that have a D-ratio and coefficeint of variation < 0.8 with no missing values
and is not flagged to be a possible solvent contaminant.

```{r}
Expand Down

0 comments on commit 5610952

Please sign in to comment.