fix: add examples for calculateMass with isotopes (issue #81)

rformassspectrometry · Apr 12, 2024 · 5610952 · 5610952
1 parent bb3d08a
commit 5610952
Show file tree

Hide file tree

Showing 6 changed files with 85 additions and 45 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: MetaboCoreUtils
 Title: Core Utils for Metabolomics Data
-Version: 1.11.2
+Version: 1.11.3
 Description: MetaboCoreUtils defines metabolomics-related core functionality
     provided as low-level functions to allow a data structure-independent usage
     across various R packages. This includes functions to calculate between ion
@@ -61,4 +61,4 @@ BugReports: https://github.com/RforMassSpectrometry/MetaboCoreUtils/issues
 URL: https://github.com/RforMassSpectrometry/MetaboCoreUtils
 biocViews: Infrastructure, Metabolomics, MassSpectrometry
 Roxygen: list(markdown=TRUE)
-RoxygenNote: 7.2.3
+RoxygenNote: 7.3.1
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,11 @@
 # MetaboCoreUtils 1.11
 
+## MetaboCoreUtils 1.11.3
+
+- Add examples on isotopes (including deuterium) can be used with
+  `calculateMass` (issue
+  [#81](https://github.com/rformassspectrometry/MetaboCoreUtils/issues/81))
+
 ## MetaboCoreUtils 1.11.2
 
 - Add functions to compute quality check of the data (issue

diff --git a/R/chemFormula.R b/R/chemFormula.R
@@ -316,10 +316,15 @@ multiplyElements <- function(x, k) {
 #'
 #' @description
 #'
-#' `calculateMass` calculates the exact mass from a formula.
+#' `calculateMass` calculates the exact mass from a formula. Isotopes are also
+#' supported. For isotopes, the isotope type needs to be specified as an
+#' element's prefix, e.g. `"[13C]"` for carbon 13 or `"[2H]"` for deuterium.
+#' A formula with 2 carbon 13 isotopes and 3 carbons would thus contain e.g.
+#' `"[13C2]C3"`.
 #'
 #' @param x `character` representing chemical formula(s) or a `list ` of
 #'     `numeric` with element counts such as returned by [countElements()].
+#'     Isotopes and deuterated elements are supported (see examples below).
 #'
 #' @return `numeric` Resulting exact mass.
 #'
@@ -332,7 +337,12 @@ multiplyElements <- function(x, k) {
 #' calculateMass("C6H12O6")
 #' calculateMass("NH3")
 #' calculateMass(c("C6H12O6", "NH3"))
+#'
+#' ## Calculate masses for formulas containing isotope information.
 #' calculateMass(c("C6H12O6", "[13C3]C3H12O6"))
+#'
+#' ## Calculate mass for a chemical with 5 deuterium.
+#' calculateMass("C11[2H5]H7N2O2")
 calculateMass <- function(x) {
     if (is.character(x))
         x <- countElements(x)

diff --git a/man/calculateMass.Rd b/man/calculateMass.Rd
diff --git a/man/quality_assessment.Rd b/man/quality_assessment.Rd
diff --git a/vignettes/MetaboCoreUtils.Rmd b/vignettes/MetaboCoreUtils.Rmd
@@ -23,13 +23,14 @@ BiocStyle::markdown()
 
 # Introduction
 
-The `MetaboCoreUtils` defines metabolomics-related core functionality provided
-as low-level functions to allow a data structure-independent usage across
-various R packages [@rainer_modular_2022]. This includes functions to calculate between ion (adduct)
-and compound mass-to-charge ratios and masses or functions to work with chemical
-formulas. The package provides also a set of adduct definitions and information
-on some commercially available internal standard mixes commonly used in MS
-experiments.
+The `r Biocpkg("MetaboCoreUtils")` package defines metabolomics-related core
+functionality provided as low-level functions to allow a data
+structure-independent usage across various R packages
+[@rainer_modular_2022]. This includes functions to calculate between ion
+(adduct) and compound mass-to-charge ratios and masses or functions to work with
+chemical formulas. The package provides also a set of adduct definitions and
+information on some commercially available internal standard mixes commonly used
+in MS experiments.
 
 For a full list of function, see
 
@@ -66,16 +67,16 @@ library(MetaboCoreUtils)
 
 ## Conversion between ion m/z and compound masses
 
-The `mass2mz` and `mz2mass` functions allow to convert between compound masses
-and ion (adduct) mass-to-charge ratios (m/z). The `MetaboCoreUtils` package
-provides definitions of common ion adducts generated by electrospray ionization
-(ESI). These can be listed with the `adductNames` function.
+The `mass2mz()` and `mz2mass()` functions allow to convert between compound
+masses and ion (adduct) mass-to-charge ratios (m/z). The *MetaboCoreUtils*
+package provides definitions of common ion adducts generated by electrospray
+ionization (ESI). These can be listed with the `adductNames()` function.
 
 ```{r}
 adductNames()
 ```
 
-With that we can use the `mass2mz` function to calculate the m/z for a set of
+With that we can use the `mass2mz()` function to calculate the m/z for a set of
 compounds assuming the generation of certain ions. In the example below we
 define masses for some theoretical compounds and calculate their expected m/z
 assuming that ions `"[M+H]+"` and `"[M+Na]+"` are generated.
@@ -86,11 +87,11 @@ mass2mz(masses, adduct = c("[M+H]+", "[M+Na]+"))
 ```
 
 As a result we get a `matrix` with each row representing one compound and each
-column the m/z for one of the defined adducts. With the `mz2mass` we could
-perform the reverse calculation, i.e. from m/z to compound masses.
+column the m/z for one of the defined adducts. With the `mz2mass()` function we
+could perform the reverse calculation, i.e. from m/z to compound masses.
 
 In addition, it is possible to calculate m/z values from chemical formulas with
-the `formula2mz` function. Below we calculate the m/z values for `[M+H]+` and
+the `formula2mz()` function. Below we calculate the m/z values for `[M+H]+` and
 `[M+Na]+` adducts from the chemical formulas of glucose and caffeine.
 
 ```{r}
@@ -102,53 +103,66 @@ formula2mz(c("C6H12O6", "C8H10N4O2"), adduct = c("[M+H]+", "[M+Na]+"))
 
 The lack of consistency in the format in which chemical formulas are written
 poses a big problem comparing formulas coming from different resources. The
-`MetaboCoreUtils` package provides functions to *standardize* formulas as well
+*MetaboCoreUtils* package provides functions to *standardize* formulas as well
 as combine formulas or substract elements from formulas. Below we use an
 artificial example to show this functionality. First we standardize a chemical
-formula with the `standardizeFormula` function.
+formula with the `standardizeFormula()` function.
 
 ```{r}
 frml <- "Na3C4"
 frml <- standardizeFormula(frml)
 frml
 ```
 
-Next we add `"H2O"` to the formula using the `addElements` function.
+Next we add `"H2O"` to the formula using the `addElements()` function.
 
 ```{r}
 frml <- addElements(frml, "H2O")
 frml
 ```
 
-We can also substract elements with the `subtractElements` function:
+We can also substract elements with the `subtractElements()` function:
 
 ```{r}
 frml <- subtractElements(frml, "H")
 frml
 ```
 
 Chemical formulas could also be multiplied with a scalar using the
-`multiplyElements` function. The counts for individual elements in a chemical
-formula can be calculated with the `countElements` function.
+`multiplyElements()` function. The counts for individual elements in a chemical
+formula can be calculated with the `countElements()` function.
 
 ```{r}
 countElements(frml)
 ```
 
-The function `adductFormula` allows in addition to create chemical formulas of
+The function `adductFormula()` allows in addition to create chemical formulas of
 specific adducts of compounds. Below we create chemical formulas for `[M+H]+`
 and `[M+Na]+` adducts for glucose and caffeine.
 
 ```{r}
 adductFormula(c("C6H12O6", "C8H10N4O2"), adduct = c("[M+H]+", "[M+Na]+"))
 ```
 
+Finally, `calculateMass()` can be used to calculate the (exact) mass for a given
+chemical formula. This function supports also the definition of isotopes in the
+formula. As an example we calculate below the mass of two chemical formulas,
+one without isotopes and one with 3 of the carbon atoms replaced by the carbon
+13 isotope.
+
+```{r}
+calculateMass(c("C6H12O6", "[13C3]C3H12O6"))
+```
+
+Note that isotopes are supported for all elements (deuterium could for example
+be expressed as `"[2H]"`).
+
 
 ## Kendrick mass defect calculation
 
 Lipids and other homologous series based on fatty acyls can be found in data by
 using Kendrick mass defects (KMD) or referenced kendrick mass defects
-(RKMD). The `MetaboCoreUtils` package provides functions to calculate everything
+(RKMD). The *MetaboCoreUtils* package provides functions to calculate everything
 around Kendrick mass defects. The following example calculates the KMD and RKMD
 for three lipids (PC(16:0/18:1(9Z)), PC(16:0/18:0), PS(16:0/18:1(9Z))) and
 checks, if they fit the RKMD of PCs detected as [M+H]+ adducts.
@@ -172,7 +186,7 @@ isRkmd(lipid_rkmd)
 
 Retention times are often not directly comparable between two LC-MS systems,
 even if nominally the same separation method is used. Conversion of retention
-times to retetion indices can overcome this issue. The `MetaboCoreUtils` package
+times to retetion indices can overcome this issue. The *MetaboCoreUtils* package
 provides a function to perform this conversion. Below we use an example based on
 indexing with a homologoues series af N-Alkyl-pyridinium sulfonates (NAPS).
 
@@ -198,7 +212,7 @@ head(rti)
 ```
 
 
-The indexing is peformed using the function `indexRtime`.
+The indexing is peformed using the function `indexRtime()`.
 
 ```{r}
 rtime$rindex_r <- indexRtime(rtime$rtime, rti)
@@ -212,7 +226,7 @@ head(rtime)
 
 Conditions that shall be compared by the retention index might not perfectly
 match. In case the deviation is linear a simple two-point correction can be
-applied to the data. This is performed by the function `correctRindex`. The
+applied to the data. This is performed by the function `correctRindex()`. The
 correction requires two reference standards and their measured RIs and reference
 RIs.
 
@@ -231,12 +245,12 @@ affected by technical noise or signal drifts. In particular, some of these
 technical variances can be specific for individual metabolites, requiring hence
 a per-feature adjustment of the abundances. One example of such noise is an
 injection order dependent signal drift that can sometimes be observed in
-untargeted metabolomics data from LC-MS experiments. The `fit_lm` function can
+untargeted metabolomics data from LC-MS experiments. The `fit_lm()` function can
 be used to model such drifts in the observed data of each single feature, for
 example with a model of the form `y ~ injection_index` that models the
 relationship between the measured abundances of a metabolite `y` on the index in
 which the respective sample was injected (`injection_index`). Subsequently, the
-data can be adjusted for the modeled drift with the `adjust_lm` function. This
+data can be adjusted for the modeled drift with the `adjust_lm()` function. This
 approach is similar to the one described by [@wehrens_improved_2016].
 
 Below we perform such an injection order dependent signal adjustment on a small
@@ -443,14 +457,14 @@ qc_lm[qc_lm_summary[, "p.value"] > 0.05] <- NA
 ```
 
 We can next adjust the data for the estimated signal drifts using the
-`adjust_lm` function. We will thus adjust abundances in all samples (including
+`adjust_lm()` function. We will thus adjust abundances in all samples (including
 the study samples) using the linear models estimated on the QC samples. For
 features for which no linear model is provided (i.e., with an `NA` in the `list`
 of linear models) the original abundances will be returned *as is*. With
 parameter `data` we need to provide a `data.frame` with all required covariates
-for the fitted models (i.e., defined by the `formula` passed to the `fit_lm`
+for the fitted models (i.e., defined by the `formula` passed to the `fit_lm()`
 call). Also, since we fitted the models to the data in `log2` scale, we need
-also to provide log2 transformed values to the `adjust_lm` function.
+also to provide log2 transformed values to the `adjust_lm()` function.
 
 ```{r}
 #' Adjust the data for the estimated signal drift
@@ -532,9 +546,9 @@ Adjustment, while not completely removing it for all features, globally reduced
 the dependency of abundances on the injection index.
 
 Summarizing, feature-wise biases in LC-MS data can be estimated, and adjusted
-for using the `fit_lm` and `adjust_lm` functions. Ideally, such biases should be
-estimated on (repeatedly measured) QC samples, with the QC samples being
-representative of the study samples (e.g. a pool of all study samples). In
+for using the `fit_lm()` and `adjust_lm()` functions. Ideally, such biases
+should be estimated on (repeatedly measured) QC samples, with the QC samples
+being representative of the study samples (e.g. a pool of all study samples). In
 addition, due to the generally relatively low number of available data points,
 the estimation of the signal drift can be unreliable and it is thus strongly
 suggested to evaluate or visually inspect some of them to derive strategies
@@ -555,7 +569,7 @@ represents a feature. In this tutorial, we'll explore a set of functions
 designed designed to calculate basic quality assessment metrics on which
 metabolomics data can subsequently be filtered.
 
-First, to get more information on the available function you can check the documentation 
+First, to get more information on the available function you can check the documentation
 
 ```{r}
 ?quality_assessment
@@ -624,7 +638,7 @@ print(blank_detection_result)
 
 All of these computations can then be used to easily filter our data and remove
 the features that do not fit our quality criteria. Below we remove all features
-that have a D-ratio and coefficeint of variation < 0.8 with no missing values 
+that have a D-ratio and coefficeint of variation < 0.8 with no missing values
 and is not flagged to be a possible solvent contaminant.
 
 ```{r}