-
Notifications
You must be signed in to change notification settings - Fork 1
Molecular formula class detection
In many instances of chemical analysis, detected compounds belong to a particular chemical class with a unique substructure pattern such as lipids, perfluoroalkyl substances (PFAS), polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs), polycyclic aromatic hydrocarbons (PAHs), phthalates, etc. On the other hand, the molecular formula enumeration method in IDSL.UFA can generate molecular formulas that have repeating substructure patterns. Therefore, to assist in the identification of compounds belonging to these classes, it's recommended to use the detect_formula_sets
function from the IDSL.UFA package. This function can detect chemical classes using two key attributes of:
-
Constant ΔH/ΔC ratios for polymeric (ΔH/ΔC = 2) and cyclic (ΔH/ΔC = 1/2) chain progressions within polymeric and cyclic classes as shown in Table S.2 - S.4.
-
Constant number of carbons and fixed sum of hydrogens and halogens (Σ(H+Br+Cl+F+I)) which represents classes similar to PCBs and PBDEs as shown in Table S.5.
The detect_formula_sets
function aggregate a vector of mixed molecular formulas based on their classes to facilitate identifying similar molecular formulas. For example, a vector of mixed molecular formulas can be obtained from an aligning annotated molecular formula table to detect related molecular formulas across a study. Likewise, this approach was used to detect presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens from the ST001430 study.
detect_formula_sets(molecular_formulas, ratio_delta_HBrClFI_C, mixed.HBrClFI.allowed,
min_molecular_formula_class, max_number_formula_class, number_processing_threads = 1)
molecular_formulas: a vector of molecular formulas
ratio_delta_HBrClFI_C: c(2, 1/2, 0). 2 to detect structures with linear carbon chains such as PFAS, lipids, chlorinated paraffins, etc. 1/2 to detect structures with cyclic chains such as PAHs. 0 to detect molecular formulas with fixed structures but changing H/Br/Cl/F/I atoms similar to PCBs, PBDEs, etc.
mixed.HBrClFI.allowed: c(TRUE
, FALSE
). Select FALSE
to detect halogenated-saturated compounds similar to PFOS or select TRUE
to detect mixed halogenated compounds with hydrogen.
min_molecular_formula_class: minimum number of molecular formulas in each class. This number should be greater than or equal to 2.
max_number_formula_class: maximum number of molecular formulas in each class
number_processing_threads: Number of processing threads for multi-threaded computations
## Example
library(IDSL.UFA)
## A vector of mixed molecular formulas
molecular_formulas <- c("C3F7O3S", "C4F9O3S", "C5F11O3S", "C6F9O3S", "C8F17O3S",
"C9F19O3S", "C10F21O3S", "C7ClF14O4", "C10ClF20O4", "C11ClF22O4", "C11Cl2F21O4",
"C12ClF24O4")
##
ratio_delta_HBrClFI_C <- 2 # to aggregate polymeric classes
mixed.HBrClFI.allowed <- FALSE # To detect only halogen saturated classes
min_molecular_formula_class <- 2
max_number_formula_class <- 20
##
classes <- detect_formula_sets(molecular_formulas, ratio_delta_HBrClFI_C, mixed.HBrClFI.allowed,
min_molecular_formula_class, max_number_formula_class, number_processing_threads = 1)