Skip to content

Commit

Permalink
Streamlining identification (#157)
Browse files Browse the repository at this point in the history
* make it easier to do matches, this way many will be able to bypass the preprocessing.
* updates to sig_noise make it more flexible
* add attributes
* new CRAN submission

---------

Co-authored-by: Zacharias Steinmetz <git@zsteinmetz.de>
  • Loading branch information
wincowgerDEV and zsteinmetz authored Nov 25, 2023
1 parent 2697f6f commit 5d88bf2
Show file tree
Hide file tree
Showing 18 changed files with 480 additions and 226 deletions.
6 changes: 3 additions & 3 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Version: 1.0.5
Date: 2023-10-31 10:27:37 UTC
SHA: 419b04607656039958a19393f6218f3ca61b817d
Version: 1.0.6
Date: 2023-11-25 12:56:02 UTC
SHA: 11f89935f939a7a7430eceaba1fda06445134587
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: OpenSpecy
Type: Package
Title: Analyze, Process, Identify, and Share Raman and (FT)IR Spectra
Version: 1.0.5
Date: 2023-10-31
Version: 1.0.6
Date: 2023-11-25
Authors@R: c(person("Win", "Cowger", role = c("cre", "aut", "dtc"),
email = "wincowger@gmail.com",
comment = c(ORCID = "0000-0001-9226-3104")),
Expand Down
9 changes: 9 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# OpenSpecy 1.0.6

## Minor Improvements

- Add attributes to `OpenSpecy` objects
- More flexible `sig_noise()`
- Simpler matching


# OpenSpecy 1.0.5

## Minor Improvements
Expand Down
144 changes: 83 additions & 61 deletions R/as_OpenSpecy.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
#' per spectrum.
#' @param metadata metadata for each spectrum with one row per spectrum,
#' see details.
#' @param attributes a list of attributes describing critical aspects for interpreting the spectra.
#' see details.
#' @param coords spatial coordinates for the spectra.
#' @param session_id logical. Whether to add a session ID to the metadata.
#' The session ID is based on current session info so metadata of the same
Expand All @@ -32,67 +34,75 @@
#' provides or is harvested from the files themselves.
#'
#' The \code{metadata} argument may contain a named list with the following
#' details (\code{*} = minimum recommended):
#' details (\code{*} = minimum recommended).
#'
#' \describe{
#' \item{`file_name*`}{The file name, defaults to
#' \code{\link[base]{basename}()} if not specified}
#' \item{`user_name*`}{User name, e.g. "Win Cowger"}
#' \item{`contact_info`}{Contact information, e.g. "1-513-673-8956,
#' wincowger@@gmail.com"}
#' \item{`organization`}{Affiliation, e.g. "University of California,
#' Riverside"}
#' \item{`citation`}{Data citation, e.g. "Primpke, S., Wirth, M., Lorenz, C.,
#' & Gerdts, G. (2018). Reference database design for the automated analysis
#' of microplastic samples based on Fourier transform infrared (FTIR)
#' spectroscopy. \emph{Analytical and Bioanalytical Chemistry}.
#' \doi{10.1007/s00216-018-1156-x}"}
#' \item{`spectrum_type*`}{Raman or FTIR}
#' \item{`spectrum_identity*`}{Material/polymer analyzed, e.g.
#' "Polystyrene"}
#' \item{`material_form`}{Form of the material analyzed, e.g. textile fiber,
#' rubber band, sphere, granule }
#' \item{`material_phase`}{Phase of the material analyzed (liquid, gas, solid) }
#' \item{`material_producer`}{Producer of the material analyzed, e.g. Dow }
#' \item{`material_purity`}{Purity of the material analyzed, e.g. 99.98%}
#' \item{`material_quality`}{Quality of the material analyzed, e.g.
#' consumer product, manufacturer material, analytical standard,
#' environmental sample }
#' \item{`material_color`}{Color of the material analyzed,
#' e.g. blue, #0000ff, (0, 0, 255) }
#' \item{material_other}{Other material description, e.g. 5 µm diameter
#' fibers, 1 mm spherical particles }
#' \item{`cas_number`}{CAS number, e.g. 9003-53-6 }
#' \item{`instrument_used`}{Instrument used, e.g. Horiba LabRam }
#' \item{instrument_accessories}{Instrument accessories, e.g.
#' Focal Plane Array, CCD}
#' \item{`instrument_mode`}{Instrument modes/settings, e.g.
#' transmission, reflectance }
#' \item{`intensity_units*`}{Units of the intensity values for the spectrum,
#' options transmittance, reflectance, absorbance }
#' \item{`spectral_resolution`}{Spectral resolution, e.g. 4/cm }
#' \item{`laser_light_used`}{Wavelength of the laser/light used, e.g.
#' 785 nm }
#' \item{`number_of_accumulations`}{Number of accumulations, e.g 5 }
#' \item{`total_acquisition_time_s`}{Total acquisition time (s), e.g. 10 s}
#' \item{`data_processing_procedure`}{Data processing procedure,
#' e.g. spikefilter, baseline correction, none }
#' \item{`level_of_confidence_in_identification`}{Level of confidence in
#' identification, e.g. 99% }
#' \item{`other_info`}{Other information }
#' \item{`license`}{The license of the shared spectrum; defaults to
#' \code{"CC BY-NC"} (see \url{https://creativecommons.org/licenses/by-nc/4.0/}
#' for details). Any other creative commons license is allowed, for example,
#' CC0 or CC BY}
#' \item{`session_id`}{A unique user and session identifier; populated
#' automatically with \code{paste(digest(Sys.info()), digest(sessionInfo()),
#' sep = "/")}}
#' \item{`file_id`}{A unique file identifier; populated automatically
#' with \code{digest(object[c("wavenumber", "spectra")])}}
#' }
#'
#' The \code{attributes} argument may contain a named list with the following
#' details, when set, they will be used to automate transformations and warning messages:
#'
#' \tabular{ll}{
#' \code{file_name*}: \tab The file name, defaults to
#' \code{\link[base]{basename}()} if not specified\cr
#' \code{user_name*}: \tab User name, e.g. "Win Cowger"\cr
#' \code{contact_info}: \tab Contact information, e.g. "1-513-673-8956,
#' wincowger@@gmail.com"\cr
#' \code{organization}: \tab Affiliation, e.g. "University of California,
#' Riverside"\cr
#' \code{citation}: \tab Data citation, e.g. "Primpke, S., Wirth, M., Lorenz,
#' C., & Gerdts, G. (2018). Reference database design for the automated analysis
#' of microplastic samples based on Fourier transform infrared (FTIR)
#' spectroscopy. \emph{Analytical and Bioanalytical Chemistry}.
#' \doi{10.1007/s00216-018-1156-x}"\cr
#' \code{spectrum_type*}: \tab Raman or FTIR\cr
#' \code{spectrum_identity*}: \tab Material/polymer analyzed, e.g.
#' "Polystyrene"\cr
#' \code{material_form}: \tab Form of the material analyzed, e.g. textile fiber,
#' rubber band, sphere, granule \cr
#' \code{material_phase}: \tab Phase of the material analyzed (liquid, gas,
#' solid) \cr
#' \code{material_producer}: \tab Producer of the material analyzed,
#' e.g. Dow \cr
#' \code{material_purity}: \tab Purity of the material analyzed, e.g. 99.98%
#' \cr
#' \code{material_quality}: \tab Quality of the material analyzed, e.g.
#' consumer product, manufacturer material, analytical standard,
#' environmental sample \cr
#' \code{material_color}: \tab Color of the material analyzed,
#' e.g. blue, #0000ff, (0, 0, 255) \cr
#' \code{material_other}: \tab Other material description, e.g. 5 µm diameter
#' fibers, 1 mm spherical particles \cr
#' \code{cas_number}: \tab CAS number, e.g. 9003-53-6 \cr
#' \code{instrument_used}: \tab Instrument used, e.g. Horiba LabRam \cr
#' \code{instrument_accessories}: \tab Instrument accessories, e.g.
#' Focal Plane Array, CCD\cr
#' \code{instrument_mode}: \tab Instrument modes/settings, e.g.
#' transmission, reflectance \cr
#' \code{intensity_units*}: \tab Units of the intensity values for the spectrum,
#' options transmittance, reflectance, absorbance \cr
#' \code{spectral_resolution}: \tab Spectral resolution, e.g. 4/cm \cr
#' \code{laser_light_used}: \tab Wavelength of the laser/light used, e.g.
#' 785 nm \cr
#' \code{number_of_accumulations}: \tab Number of accumulations, e.g 5 \cr
#' \code{total_acquisition_time_s}: \tab Total acquisition time (s), e.g. 10 s
#' \cr
#' \code{data_processing_procedure}: \tab Data processing procedure,
#' e.g. spikefilter, baseline correction, none \cr
#' \code{level_of_confidence_in_identification}: \tab Level of confidence in
#' identification, e.g. 99% \cr
#' \code{other_info}: \tab Other information \cr
#' \code{license}: \tab The license of the shared spectrum; defaults to
#' \code{"CC BY-NC"} (see
#' \url{https://creativecommons.org/licenses/by-nc/4.0/} for details). Any other
#' creative commons license is allowed, for example, CC0 or CC BY \cr
#' \code{session_id}: \tab A unique user and session identifier; populated
#' automatically with \code{paste(digest(Sys.info()), digest(sessionInfo()),
#' sep = "/")}\cr
#' \code{file_id}: \tab A unique file identifier; populated automatically
#' with \code{digest(object[c("wavenumber", "spectra")])}\cr
#' \describe{
#' \item{`intensity_units`}{supported options include `"absorbance"`,
#' `"transmittance"`, or `"reflectance"`}
#' \item{`derivative_order`}{supported options include `"0"`, `"1"`, or
#' `"2"`}
#' \item{`baseline`}{supported options include `"raw"` or `"nobaseline"`}
#' \item{`spectra_type`}{supported options include `"ftir"` or `"raman"`}
#' }
#'
#' @return
Expand Down Expand Up @@ -250,6 +260,12 @@ as_OpenSpecy.default <- function(x, spectra,
level_of_confidence_in_identification = NULL,
other_info = NULL,
license = "CC BY-NC"),
attributes = list(
intensity_unit = NULL,
derivative_order = NULL,
baseline = NULL,
spectra_type = NULL
),
coords = "gen_grid",
session_id = FALSE,
...) {
Expand All @@ -266,7 +282,13 @@ as_OpenSpecy.default <- function(x, spectra,
if (length(x) != nrow(spectra))
stop("'x' and 'spectra' must be of equal length", call. = F)

obj <- structure(list(), class = c("OpenSpecy", "list"))
obj <- structure(list(),
class = c("OpenSpecy", "list"),
intensity_unit = attributes$intensity_unit,
derivative_order = attributes$derivative_order,
baseline = attributes$baseline,
spectra_type = attributes$spectra_type
)

obj$wavenumber <- x[order(x)]

Expand Down
2 changes: 1 addition & 1 deletion R/def_features.R
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def_features.OpenSpecy <- function(x, features, ...) {
#' @importFrom stats dist
.def_features <- function(x, binary, name = NULL) {
# Label connected components in the binary image
binary_matrix <- matrix(binary, ncol = max(x$metadata$y) + 1, byrow = T)
binary_matrix <- matrix(binary, ncol = max(x$metadata$x) + 1, byrow = T)
labeled_image <- imager::label(imager::as.cimg(binary_matrix),
high_connectivity = T)

Expand Down
39 changes: 33 additions & 6 deletions R/match_spec.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
#' \code{filter_spec()} filters an Open Specy object.
#'
#' @param x an \code{OpenSpecy} object, typically with unknowns.
#' @param conform Whether to conform the spectra to the library wavenumbers or not.
#' @param type the type of conformation to make returned by \code{conform_spec()}
#' @param library an \code{OpenSpecy} or \code{glmnet} object representing the
#' reference library of spectra or model to use in identification.
#' @param na.rm logical; indicating whether missing values should be removed
Expand Down Expand Up @@ -93,11 +95,30 @@ cor_spec.default <- function(x, ...) {
#' @rdname match_spec
#'
#' @export
cor_spec.OpenSpecy <- function(x, library, na.rm = T, ...) {
cor_spec.OpenSpecy <- function(x, library, na.rm = T, conform = F,
type = "roll", ...) {
if(conform) x <- conform_spec(x, library$wavenumber, res = NULL, type)

if(!is.null(attr(x, "intensity_unit")) &&
attr(x, "intensity_unit") != attr(library, "intensity_unit"))
warning("Intensity units between the library and unknown are not the same")

if(!is.null(attr(x, "derivative_order")) &&
attr(x, "derivative_order") != attr(library, "derivative_order"))
warning("Derivative orders between the library and unknown are not the same")

if(!is.null(attr(x, "baseline")) &&
attr(x, "baseline") != attr(library, "baseline"))
warning("Baselines between the library and unknown are not the same")

if(!is.null(attr(x, "spectra_type")) &&
attr(x, "spectra_type") != attr(library, "spectra_type"))
warning("Spectra types between the library and unknown are not the same")

if(sum(x$wavenumber %in% library$wavenumber) < 3)
stop("there are less than 3 matching wavenumbers in the objects you are ",
"trying to correlate; this won't work for correlation analysis. ",
"Consider first conforming the spectra to the same wavenumbers.",
"trying to correlate; this won't work for correlation analysis; ",
"consider first conforming the spectra to the same wavenumbers",
call. = F)

if(!all(x$wavenumber %in% library$wavenumber))
Expand Down Expand Up @@ -134,11 +155,12 @@ match_spec.default <- function(x, ...) {
#' @rdname match_spec
#'
#' @export
match_spec.OpenSpecy <- function(x, library, na.rm = T, top_n = NULL,
order = NULL, add_library_metadata = NULL,
match_spec.OpenSpecy <- function(x, library, na.rm = T, conform = F,
type = "roll", top_n = NULL, order = NULL,
add_library_metadata = NULL,
add_object_metadata = NULL, fill = NULL, ...) {
if(is_OpenSpecy(library)) {
res <- cor_spec(x, library = library) |>
res <- cor_spec(x, library = library, conform = conform, type = type) |>
ident_spec(x, library = library, top_n = top_n,
add_library_metadata = add_library_metadata,
add_object_metadata = add_object_metadata)
Expand Down Expand Up @@ -259,6 +281,11 @@ filter_spec.OpenSpecy <- function(x, logic, ...) {
x$spectra <- x$spectra[, logic, with = F]
x$metadata <- x$metadata[logic,]

if(ncol(x$spectra) == 0 | ncol(x$metadata) == 0)
stop("the OpenSpecy object created contains zero spectra, this is not well ",
"supported, if you have specific scenarios where this is required ",
"please share it with the developers and we can make a workaround")

return(x)
}

Expand Down
63 changes: 44 additions & 19 deletions R/sig_noise.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,19 @@
#'
#' @param x an \code{OpenSpecy} object.
#' @param metric character; specifying the desired metric to calculate.
#' @param step numeric; the step size of the region to look for the run_sig_over_noise option.
#' @param sig_min numeric; the minimum wavenumber value for the signal region.
#' @param sig_max numeric; the maximum wavenumber value for the signal region.
#' @param noise_min numeric; the minimum wavenumber value for the noise region.
#' @param noise_max numeric; the maximum wavenumber value for the noise region.
#' @param abs logical; whether to return the absolute value of the result
#' Options include \code{"sig"} (mean intensity), \code{"noise"} (standard
#' deviation of intensity), \code{"sig_times_noise"} (absolute value of
#' signal times noise), \code{"sig_over_noise"} (absolute value of signal /
#' noise), \code{"run_sig_over_noise"} (absolute value of signal /
#' noise where signal is estimated as the max intensity and noise is
#' estimated as the height of a low intensity region.),
#' \code{"log_tot_sig"} (sum of the inverse log intensities, useful for spectra in log units),
#' noise where signal is estimated as the max intensity and noise is
#' estimated as the height of a low intensity region.),
#' \code{"log_tot_sig"} (sum of the inverse log intensities, useful for spectra in log units),
#' or \code{"tot_sig"} (sum of intensities).
#' @param na.rm logical; indicating whether missing values should be removed
#' when calculating signal and noise. Default is \code{TRUE}.
Expand All @@ -23,6 +29,7 @@
#' A numeric vector containing the calculated metric for each spectrum in the
#' \code{OpenSpecy} object.
#'
#' @seealso [restrict_range()]
#' @examples
#' data("raman_hdpe")
#'
Expand All @@ -49,32 +56,50 @@ sig_noise.default <- function(x, ...) {
#'
#' @export
sig_noise.OpenSpecy <- function(x, metric = "run_sig_over_noise",
na.rm = TRUE, ...) {
vapply(x$spectra, function(y) {
if(length(y[!is.na(y)]) < 20) {
warning("Need at least 20 intensity values to calculate the signal or ",
"noise values accurately; returning NA", call. = F)
return(NA)
}
na.rm = TRUE, step = 20,
sig_min = NULL, sig_max = NULL,
noise_min = NULL, noise_max = NULL, abs = T, ...) {

values <- vapply(x$spectra, function(y) {
if(metric == "run_sig_over_noise") {
max <- frollapply(y[!is.na(y)], 20, max)
max[(length(max) - 19):length(max)] <- NA
signal <- max(max, na.rm = T)#/mean(x, na.rm = T)
if(length(y[!is.na(y)]) < step) {
warning(paste0("Need at least ", step, " intensity values to calculate ",
"the signal or noise values accurately with ",
"run_sig_over_noise; returning NA"), call. = F)
return(NA)
}
max <- frollapply(y[!is.na(y)], step, max)
max[(length(max) - (step-1)):length(max)] <- NA
signal <- max(max, na.rm = T)
noise <- median(max[max != 0], na.rm = T)
}
else {
signal = mean(y, na.rm = na.rm)
noise = sd(y, na.rm = na.rm)
} else {
if(!is.null(sig_min) & !is.null(sig_max)){
sig_intens <- y[x$wavenumber >= sig_min & x$wavenumber <= sig_max]
} else {
sig_intens <- y
}
if(!is.null(noise_min) & !is.null(noise_max)){
noise_intens <- y[x$wavenumber >= noise_min & x$wavenumber <= noise_max]
} else {
noise_intens <- y
}
signal <- mean(sig_intens, na.rm = na.rm)
noise <- sd(noise_intens, na.rm = na.rm)
}

if(metric == "sig") return(signal)
if(metric == "noise") return(noise)
if(metric == "sig_times_noise") return(abs(signal * noise))
if(metric == "sig_times_noise") return(signal * noise)

if(metric %in% c("sig_over_noise", "run_sig_over_noise"))
return(abs(signal/noise))
return(signal/noise)
if(metric == "tot_sig") return(sum(y))
if(metric == "log_tot_sig") return(sum(exp(y)))
}, FUN.VALUE = numeric(1))

if(abs) {
return(abs(values))
} else {
return(values)
}
}
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,5 +103,5 @@ Needs an Open Source Community: Open Specy to the Rescue!”
[10.1021/acs.analchem.1c00123](https://doi.org/10.1021/acs.analchem.1c00123).

Cowger W, Steinmetz Z, Leong N, Faltynkova A (2023). “OpenSpecy: Analyze,
Process, Identify, and Share Raman and (FT)IR Spectra.” *R package*, **1.0.5**.
Process, Identify, and Share Raman and (FT)IR Spectra.” *R package*, **1.0.6**.
[https://github.com/wincowgerDEV/OpenSpecy-package](https://github.com/wincowgerDEV/OpenSpecy-package).
2 changes: 1 addition & 1 deletion cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Test environments

* manjaro linux 6.5.5-1 (local), R-4.3.1
* manjaro linux 6.6.1-1 (local), R-4.3.2
* macOS latest (via GitHub Actions), R-release
* ubuntu latest (via GitHub Actions), R-devel
* ubuntu latest (via GitHub Actions), R-release
Expand Down
Loading

0 comments on commit 5d88bf2

Please sign in to comment.