MeTaQuaC is an R package to facilitate efficient quality control (QC) for targeted metabolomics analysis with focus on Biocrates Kits. It provides a simple interface to create extensive, reproducible and well documented HTML QC reports in an automated fashion.
Please refer to our article to learn more about the underlying ideas:
Kuhring M. et al., 2020, Concepts and Software Package
for Efficient Quality Control in Targeted Metabolomics Studies:
MeTaQuaC
Recent changes:
- [v0.1.32] Add parameter to automatically export main data sets
- [v0.1.32] Add parameter to retain report intermediates (markdown and figures)
- [v0.1.32] Update and restructured documentation
- [v0.1.32] Changed default kit to "Biocrates MxP Quant 500 Kit"
MeTaQuaC supports the following Biocrates Kits:
- Biocrates Bile Acids Kit
- Biocrates AbsoluteIDQ p180 Kit
- Biocrates AbsoluteIDQ p400 HR Kit
- Biocrates AbsoluteIDQ Stero17 Kit
- Biocrates MxP Quant 500 Kit
- Generic Data (for custom targeted experiments)
Before using MeTaQuaC to create a QC report for Biocrates data, please make sure to be familiar with the Biocrates kit used, i.e familiarize yourself with the compounds, sample types, status values, terminology, analytical specification, etc. (please refer to Biocrates' manuals and documents provided with the kit used).
- MeTaQuaC reports sometimes struggle with memory issues,
either during pandoc conversion to HTML or when opening the HTML in a browser.
This is a result of providing integrated data sets
as feature-rich JavaScript tables and also in redundant occurrence.
To reduce report size, use the parameter
data_tables
("stats", so the main data tables in the report get disabled) in combination with the parameterdata_export_{long|wide}
(TRUE, so all the main data tables get automatically exported as csv). - Several of MeTaQuaC's filters and QC measures are designed around the integrity of technical replicates. MeTaQuaC intends to make use of Biocrates standard QC samples (QC Level 2, referred to as Reference QC in the reports) and pooled QC samples. To infer e.g. missing value rates, technical variability (%RSD) and batch drift, these QC samples need to be replicated uniformly within the sample sequence. A minimum of five replicates each is recommended. Note that by default, Biocrates kit plates only contain one single standard QC samples of three different concentrations each (QC Level 1, 2, and 3). Thus, QC Level 2 sample replicates need to be applied separately. Running replicate QC samples is highly recommended to assure reliable high quality data according to community standards.
In general, to run the following code please copy and paste the code chunks into your R interface and then press return.
MeTaQuaC is a package developed and tested with GNU R (version >= 3.4.4).
In R, install the devtools
package to enable installation from git:
install.packages("devtools")
Other package dependencies (as stated in the DESCRIPTION file) should be resolved automatically during installation.
To install and activate the latest release of MeTaQuaC execute the following commands in R:
devtools::install_github("https://github.com/bihealth/metaquac")
library(metaquac)
packageVersion("metaquac")
The MeTaQuaC package provides one simple function to create an extensive, reproducible and well documented report. The data for this can be taken directly from Biocrates IDQ software. There are instructions below on how to do this.
To create the report, execute the function metaquac::create_qc_report
in R,
including the relevant parameters below.
An example piece of code and full description of all parameters is provided
in the package and can be accessed in R (run ?metaquac::create_qc_report
).
create_qc_report
requires only one obligatory parameter data_files
,
consisting of an R list of at least one named batch
indicating corresponding files per vector, each (e.g.
list(Batch1 = c("Batch1_LC1.txt", "Batch1_LC2.txt"),
Batch2 = c("Batch2_LC1.txt", "Batch2_LC2.txt"))).
Furthermore, commonly used parameters should include the kit
(defaults to "Biocrates MxP Quant 500 Kit")
and the measurement_type
(defaults to "LC").
Always create separate reports for LC and FIA data. Indicating LC and FIA data files within one report is not supported and will result in unexpected behavior including results difficult to interpret.
Note: Depending on the number of compounds per kit and injection as well as the number of samples in a study, the resulting HTML reports may considerably grow in size due to the extensive amount of figures and intermediate data provided.
The data needed for the QC report is directly exported from Biocrates' MetIDQ software.
- After acquisition and processing with MetIDQ switch to the
MetSTAT
module (please refer to Biocrates' manuals and documents provided with the kit used to infer the relevant steps for your data). - Select your project and samples
(including at least biological, QC and calibration standards samples)
so they show up under the
Display Data
-Data
tab. - Data normalization may be applied as described in the MetIDQ manual.
- In the
Save Options
section, selectExport Status Information
andExport all types of values at once
. Do not selectTranspose Values
andAdd comments to Excel files
nor enable the replacement of any values! - Press
Export
and save the data as tab-separated text file. - Repeat the export for all corresponding injection types and batches
(watch out for a consistent
Display Unit
between batches).
The necessary settings are additionally highlighted in the figure below.
Instead of data exported from MetIDQ (recommended when using Biocrates kits), MeTaQuaC reports can be created using a basic generic data format, thus enabling QC reports for any custom targeted experiment.
MeTaQuaC adapts to different terminologies within the reports, in particular with respect to column names, sample types and status values (as these can be highly input specific). While MetIDQ exports data with predefined terms, the required or recommended terms for the generic data are definied below.
Currently, the generic data consists of one or several files (with same
dimensions) containing one type of measurement information each. I.e. one file
contains e.g. the concentrations, one the areas, one status values and so on.
Please refer to the parameter data_files
and generic_data_types
to see how
to map the currently supported data types with your files.
Generic data files are tab-separated text files containing typical sample (row)
to compound (column) matrices, plus some additional sample annotation columns.
Beside compounds in the latter columns (watch out to specify their position
correctly with parameter generic_index_first_compound
), the following columns
can be used by MeTaQuaC:
- Required:
Sample Identification
The ID of the sample.Sample Type
The type of sample (see below).Sample Position
The position of the sample in the acquisition sequence. and/orWell Position
The position of the sample on the well plate (currently only row-major order and only for 96 well plates!)
- Optional
Batch
The ID of a batch, if the data contains several batches.- Any study variables of interest to use with
*_variables
parameters.
The following sample types (in column Sample Type
) are recognized for specific
QC analysis:
Sample
The biological/study samples (required).Reference QC
QC samples based on injected standards (preferentially replicated).Pooled QC
QC samples based on pooling biological samples (preferentially replicated).- Other sample types may appear in stats and visualizations, but are currently not considered for specific processing such e.g. filtering, variances, etc.
As status values in generic data will be software specific, the parameter
preproc_keep_status
might need modification if other than "Valid" measurements
should be considered.
Generic data files would look like this (same format for different measurement informations, just with different values in compound columns, e.g. concentrations, areas or statuses):
Sample Identification | Sample Type | Well Position | Sex | Ala | Arg | Asn |
---|---|---|---|---|---|---|
A | Sample | 74 | Male | 123 | 68.6 | 29.2 |
B | Sample | 3 | Female | 205 | 111 | 32.7 |
C | Pooled QC | 27 | 192 | 67.5 | 35.3 | |
Cal2 | Standard L2 | 61 | 44.4 | 7.84 | 9.53 | |
QCb | Reference QC | 50 | 411 | 121 | 18.3 | |
QCa | Reference QC | 87 | 375 | 102 | 16.9 | |
Cal3 | Standard L3 | 73 | 189 | 43.4 | 46.9 | |
Cal1 | Standard L1 | 49 | 18.4 | 4.18 | 4.49 | |
... |
Currently, QC report creation is controlled by the one main function of the
MeTaQuaC package, metaquac::create_qc_report
.
To test the QC report creation, the package includes sample data for a Biocrates MxP Quant 500 Kit with one batch as well as for a Biocrates AbsoluteIDQ p400 HR Kit with four batches.
To improve documentation consistency,
parameters and example calls are no longer reported in this readme.
Please refer to the integrated package documenation in R
(run ?metaquac::create_qc_report
).
For filtering parameters, please note that:
- Default missing value ratio thresholds are set according to Broadhurst et al. 2018
- Default %RSD (precision) thresholds are set according to EMA 2018
- In case of the AbsoluteIDQ® p400 HR kit, MetIDQ may produce data exports without any notation of a metabolite if it hasn’t been measured in any sample of a batch (i.e. the compound is not mentioned in the export at all). However, listing the complete set of metabolites is a necessary information, in particular when analyzing e.g. missing value ratios and comparing batches. Therefore, imported data is extended using an internal list of all the potential metabolites of the AbsoluteIDQ® p400 HR kit. A prominent note in the report indicates this correction (if necessary) and lists the affected metabolites. We like to stress that this procedure does not alter any measurement data. This is merely a computational fix recreating the full list of target compounds. MeTaQuaC will just complement such missing compounds (i.e. basically add a column of missing values for these compounds). The selection of the kit (p400) as well as the injection type (LC or FIA) is relevant to know which set of compounds need to be used for completion.