Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop 3.9.0 #417

Merged
merged 63 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
ebf65a3
LOD both methods additions
kathy-nevola Jun 20, 2024
78e90bf
shrink logo
kathy-nevola Jun 20, 2024
ac3c9ce
Merge remote-tracking branch 'origin/lod_updates' into lod_both_methods
kathy-nevola Jun 21, 2024
47fae4a
Add both section to tutorial
kathy-nevola Jun 21, 2024
ad8d800
Merge remote-tracking branch 'origin/lod_updates' into lod_both_methods
kathy-nevola Jun 21, 2024
5604b35
add documentation on both methods
kathy-nevola Jun 21, 2024
a2f93d8
updated tutorial links
kathy-nevola Jul 11, 2024
46619c1
Add instructions for multiple NPX file import in Vignett.html
dtopouza Jul 11, 2024
09808be
Added additional information about LOD from counts
kathy-nevola Jul 11, 2024
b6558fc
ignore quote for comma delim
kathy-nevola Jul 12, 2024
d4380f7
add new headers and remove QC_Warning as required
kathy-nevola Jul 12, 2024
905a52a
Add quant warning
kathy-nevola Jul 12, 2024
93e268a
update file to remove rowname and quotes
kathy-nevola Jul 12, 2024
b4a5c96
Merge branch 'develop' into lod_both_methods
kathy-nevola Jul 12, 2024
985e360
Update instructions to write parquet file
kathy-nevola Jul 17, 2024
8dd022b
Add HT Bridging sample recommendations
kathy-nevola Jul 17, 2024
ca19155
Add missing link and search terms
kathy-nevola Jul 19, 2024
54215b6
Merge pull request #402 from Olink-Proteomics/new_links
dtopouza Jul 19, 2024
d932387
Merge pull request #409 from Olink-Proteomics/HT_bridge_samples
dtopouza Jul 19, 2024
bba2342
Merge branch 'develop' into write_parquet
kathy-nevola Jul 22, 2024
1aab435
Update Description, News and Cran comments
kathy-nevola Jul 22, 2024
0680e8b
spelling fix
kathy-nevola Jul 22, 2024
2e6d31f
reworded for clarity
kathy-nevola Jul 22, 2024
fd9c454
Merge pull request #408 from Olink-Proteomics/write_parquet
kathy-nevola Jul 22, 2024
a8c1911
Merge pull request #406 from Olink-Proteomics/csv_added_support
kathy-nevola Jul 22, 2024
a4d5b12
Merge branch 'develop' into lod_both_methods
kathy-nevola Jul 24, 2024
523cc65
Merge branch 'develop' into LOD_count_info
kathy-nevola Jul 24, 2024
8bbb052
Add an error in ANOVA functions when control assays have not been rem…
dtopouza Jul 24, 2024
87b377f
Update OlinkAnalyze/vignettes/Vignett.Rmd
dtopouza Jul 24, 2024
1e6d74b
Update OlinkAnalyze/vignettes/Vignett.Rmd
dtopouza Jul 24, 2024
33eca47
Merge pull request #403 from Olink-Proteomics/develop_dgt
dtopouza Jul 25, 2024
e264934
updated with recommended changes from review
kathy-nevola Jul 25, 2024
9fcaa0d
Merge pull request #404 from Olink-Proteomics/LOD_count_info
kathy-nevola Jul 25, 2024
6ed722f
fix bullet points in documentation
kathy-nevola Jul 25, 2024
40276ad
fixed table and wording per review
kathy-nevola Jul 25, 2024
e62f21b
added documentation and checks for file
kathy-nevola Jul 25, 2024
46ff052
added ANOVA error update
kathy-nevola Jul 25, 2024
01e9a11
Apply suggestions from code review
dtopouza Jul 25, 2024
f2257be
Merge pull request #390 from Olink-Proteomics/lod_both_methods
kathy-nevola Jul 26, 2024
70d1154
Change if statement for assay type, add list of found control assays …
dtopouza Jul 29, 2024
09fa72d
Add error in Olink ANOVA functions when external controls are present…
dtopouza Jul 30, 2024
579169f
Document
dtopouza Jul 30, 2024
81e4fae
fix example documentation and make ctrl in name warning
kathy-nevola Jul 30, 2024
456b2b0
added control samples warning/error
kathy-nevola Jul 30, 2024
1696b58
remove ctrl sample check code and tests
kathy-nevola Jul 31, 2024
95b8b0a
update documentation
kathy-nevola Jul 31, 2024
97f3a55
Remove unnecessary tests
kathy-nevola Jul 31, 2024
ec9098a
update news per PR #416
kathy-nevola Jul 31, 2024
3bd24b3
remove sample test DF
kathy-nevola Jul 31, 2024
50e4676
update vignette
kathy-nevola Jul 31, 2024
5501558
update vignette boxplot
kathy-nevola Jul 31, 2024
54901c1
updated vignette posthoc
kathy-nevola Jul 31, 2024
38e10d6
Merge pull request #416 from Olink-Proteomics/OA-2272_trycatch_controls
kathy-nevola Jul 31, 2024
01d4633
Merge pull request #415 from Olink-Proteomics/news_3.9
kathy-nevola Jul 31, 2024
f04fbae
change to new pipe
kathy-nevola Aug 2, 2024
8bd3510
remove scale_name (now name)
kathy-nevola Aug 2, 2024
8e0fd0a
remove name arguement
kathy-nevola Aug 2, 2024
ae714ce
fix dot notation and remove deprecated do statement
kathy-nevola Aug 2, 2024
286405c
fix internal function documentation
kathy-nevola Aug 2, 2024
16a87b5
added scale_name as deprecated for support with older ggplot2 versions
kathy-nevola Aug 5, 2024
b15d97e
update scale name to work with ggplot per version
kathy-nevola Aug 5, 2024
4f09083
Merge pull request #421 from Olink-Proteomics/anova_pipes
kathy-nevola Aug 6, 2024
3aa59a0
spelling edit fixes
kathy-nevola Aug 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions OlinkAnalyze/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: OlinkAnalyze
Title: Facilitate Analysis of Proteomic Data from Olink
Version: 3.8.2
Version: 3.9.0
Authors@R: c(
person("Kathleen", "Nevola", , "biostattools@olink.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-5183-6444", Github = "kathy-nevola")),
Expand All @@ -27,6 +27,8 @@ Authors@R: c(
comment = c(Github = "leiliuC")),
person("Kristyn", "Chin", role = "aut",
comment = c(Github = "kristynchin-olink")),
person("Danai", "Topouza", role = "aut",
comment = c(Github = "dtopouza", ORCID = "0000-0002-6897-9281")),
person("Kristian", "Hodén", role = "ctb",
comment = c(ORCID = "0000-0003-0354-0662", Github = "kristianHoden")),
person("Per", "Eriksson", role = "ctb",
Expand All @@ -43,8 +45,6 @@ Authors@R: c(
comment = c(Github = "olofmansson")),
person("Ola", "Caster", role = "ctb",
comment = c(Github = "OlaCaster")),
person("Danai", "Topouza", role = "ctb",
comment = c(Github = "dtopouza", ORCID = "0000-0002-6897-9281")),
person("Olink", role = c("cph", "fnd"))
)
Description: A collection of functions to facilitate analysis of proteomic
Expand Down
17 changes: 17 additions & 0 deletions OlinkAnalyze/NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
# Olink Analyze 3.9.0
## Minor Changes
* Explore HT recommended bridging samples have been added to Introduction to Bridging tutorial (#409, @kathy-nevola)
* Support for CSVs with SampleQC column was added to read_NPX (#406, @kathy-nevola)
* Support for Olink Analyze Export parquets was added to read_NPX (#408, @kathy-nevola)
* Quantitative value csvs will now give a warning about limited support for Quant data (#406, @kathy-nevola)
* Instructions for importing multiple NPX files has been added to the overview tutorial (#403, @dtopouza)
* Additional background information was added to the LOD tutorial to clarify how LOD is calculated from counts (#404, @kathy-nevola)
* LOD can now be calculated using fixed LOD, negative controls, or both methods (#390, @kathy-nevola)
* An error message will now appear when running anova and control assays are present (#416, @dtopouza)
* Danai Topouza's role has been changed from contributor to author (#415, @kathy-nevola)

## Bug Fixes
* URLs in tutorials will now direct to updated olink.com locations (#402, @kathy-nevola)
* Instructions to export parquet files with LOD have been updated (#408, @kathy-nevola)
* removed scale_name argument when ggplot2 3.5+ is installed (#421, @kathy-nevola)

# Olink Analyze 3.8.2
## Bug Fixes
* update to URL hyperlink in LOD tutorial to include https
Expand Down
182 changes: 120 additions & 62 deletions OlinkAnalyze/R/Olink_anova.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#'Samples that have no variable information or missing factor levels are automatically removed from the analysis (specified in a message if verbose = TRUE).
#'Character columns in the input dataframe are automatically converted to factors (specified in a message if verbose = TRUE).
#'Numerical variables are not converted to factors.
#'Control samples should be removed before using this function.
#'Control assays (AssayType is not "assay", or Assay contains "control" or "ctrl") should be removed before using this function.
#'If a numerical variable is to be used as a factor, this conversion needs to be done on the dataframe before the function call. \cr\cr
#'Crossed analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases: \cr
#'\itemize{
Expand Down Expand Up @@ -54,7 +56,7 @@
#'
#' library(dplyr)
#'
#' npx_df <- npx_data1 %>% filter(!grepl('control',SampleID, ignore.case = TRUE))
#' npx_df <- npx_data1 |> filter(!grepl('control|ctrl',SampleID, ignore.case = TRUE))
#'
#' #One-way ANOVA, no covariates.
#' #Results in a model NPX~Time
Expand Down Expand Up @@ -106,10 +108,28 @@ olink_anova <- function(df,
stop('The df and variable arguments need to be specified.')
}

# Stop if internal controls (assays) have not been removed
if ("AssayType" %in% names(df)) {
if (any(df$AssayType != "assay")) {
ctrl_assays <- df |>
dplyr::filter(AssayType != "assay")

stop(paste0('Control assays have not been removed from the dataset.\n Assays with AssayType != "assay" should be excluded.\n The following ', length(unique(ctrl_assays$Assay)) ,' control assays were found:\n ',
paste(strwrap(toString(unique(ctrl_assays$Assay)), width = 80), collapse = "\n")))
}
} else if (any(stringr::str_detect(df$Assay, stringr::regex("control|ctrl", ignore_case = TRUE)))) {
ctrl_assays <- df |>
dplyr::filter(stringr::str_detect(df$Assay, stringr::regex("control|ctrl", ignore_case = TRUE)))

stop(paste0('Control assays have not been removed from the dataset.\n Assays with "control" in their Assay field should be excluded.\n The following ', length(unique(ctrl_assays$Assay)) ,' control assays were found:\n ',
paste(strwrap(toString(unique(ctrl_assays$Assay)), width = 80), collapse = "\n")))
}


withCallingHandlers({

#Filtering on valid OlinkID
df <- df %>%
df <- df |>
dplyr::filter(stringr::str_detect(OlinkID,
"OID[0-9]{5}"))

Expand Down Expand Up @@ -176,13 +196,13 @@ olink_anova <- function(df,

for(effect in single_fixed_effects){

current_nas <- df %>%
dplyr::filter(!(OlinkID %in% npxCheck$all_nas)) %>% #Exclude assays that have all NA:s
dplyr::group_by(OlinkID, !!rlang::ensym(effect)) %>%
dplyr::summarise(n = dplyr::n(), n_na = sum(is.na(NPX))) %>%
dplyr::ungroup() %>%
dplyr::filter(n == n_na) %>%
dplyr::distinct(OlinkID) %>%
current_nas <- df |>
dplyr::filter(!(OlinkID %in% npxCheck$all_nas)) |> #Exclude assays that have all NA:s
dplyr::group_by(OlinkID, !!rlang::ensym(effect)) |>
dplyr::summarise(n = dplyr::n(), n_na = sum(is.na(NPX))) |>
dplyr::ungroup() |>
dplyr::filter(n == n_na) |>
dplyr::distinct(OlinkID) |>
dplyr::pull(OlinkID)


Expand All @@ -198,12 +218,12 @@ olink_anova <- function(df,
call. = FALSE)
}

number_of_samples_w_more_than_one_level <- df %>%
dplyr::group_by(SampleID) %>%
dplyr::summarise(n_levels = dplyr::n_distinct(!!rlang::ensym(effect), na.rm = TRUE)) %>%
dplyr::ungroup() %>%
dplyr::filter(n_levels > 1) %>%
nrow(.)
number_of_samples_w_more_than_one_level <- df |>
dplyr::group_by(SampleID) |>
dplyr::summarise(n_levels = dplyr::n_distinct(!!rlang::ensym(effect), na.rm = TRUE)) |>
dplyr::ungroup() |>
dplyr::filter(n_levels > 1) |>
nrow()

if (number_of_samples_w_more_than_one_level > 0) {
stop(paste0("There are ",
Expand Down Expand Up @@ -259,43 +279,41 @@ olink_anova <- function(df,


if(!is.null(covariates) & any(grepl(":", covariates))){
covariate_filter_string <- covariates[str_detect(covariates, ':')]
covariate_filter_string <- covariates[stringr::str_detect(covariates, ':')]
covariate_filter_string <- sub("(.*)\\:(.*)$", "\\2:\\1", covariate_filter_string)
covariate_filter_string <- c(covariates, covariate_filter_string)

}else{
covariate_filter_string <- covariates
}

p.val <- df %>%
dplyr::filter(!(OlinkID %in% npxCheck$all_nas)) %>% #Exclude assays that have all NA:s
dplyr::filter(!(OlinkID %in% nas_in_var)) %>%
dplyr::group_by(Assay, OlinkID, UniProt, Panel) %>%
dplyr::do(generics::tidy(car::Anova(stats::lm(as.formula(formula_string),
data=.,
contrasts = sapply(fact.vars,function(x) return(contr.sum),
simplify = FALSE)),type=3))) %>%

dplyr::ungroup() %>%
dplyr::filter(!term %in% c('(Intercept)','Residuals')) %>%
dplyr::mutate(covariates = term %in% covariate_filter_string) %>%
dplyr::group_by(covariates) %>%
dplyr::mutate(Adjusted_pval = p.adjust(p.value,method="fdr")) %>%
dplyr::mutate(Threshold = ifelse(Adjusted_pval<0.05,"Significant","Non-significant")) %>%
p.val <- df |>
dplyr::filter(!(OlinkID %in% npxCheck$all_nas)) |> #Exclude assays that have all NA:s
dplyr::filter(!(OlinkID %in% nas_in_var)) |>
dplyr::group_by(Assay, OlinkID, UniProt, Panel) |>
dplyr::group_modify(~ internal_anova(x = .x,
formula_string = formula_string,
fact.vars = fact.vars))|>
dplyr::ungroup()|>
dplyr::filter(!term %in% c('(Intercept)','Residuals')) |>
dplyr::mutate(covariates = term %in% covariate_filter_string) |>
dplyr::group_by(covariates) |>
dplyr::mutate(Adjusted_pval = p.adjust(p.value,method="fdr")) |>
dplyr::mutate(Threshold = ifelse(Adjusted_pval<0.05,"Significant","Non-significant")) |>
dplyr::mutate(Adjusted_pval = ifelse(covariates,NA,Adjusted_pval),
Threshold = ifelse(covariates,NA,Threshold)) %>%
dplyr::ungroup() %>%
dplyr::select(-covariates) %>%
dplyr::mutate(meansq=sumsq/df) %>%
Threshold = ifelse(covariates,NA,Threshold)) |>
dplyr::ungroup() |>
dplyr::select(-covariates) |>
dplyr::mutate(meansq=sumsq/df) |>
dplyr::select(Assay,OlinkID,UniProt,Panel,term,df,sumsq,
meansq,statistic,p.value,Adjusted_pval,Threshold) %>%
meansq,statistic,p.value,Adjusted_pval,Threshold) |>
dplyr::arrange(Adjusted_pval)


if(return.covariates){
return(p.val)
} else{
return(p.val %>%
return(p.val |>
dplyr::filter(!term%in%covariate_filter_string))
}

Expand All @@ -310,6 +328,8 @@ olink_anova <- function(df,
#'Performs a post hoc ANOVA test using emmeans::emmeans with Tukey p-value adjustment per assay (by OlinkID) for each panel at confidence level 0.95.
#'See \code{olink_anova} for details of input notation. \cr\cr
#'The function handles both factor and numerical variables and/or covariates.
#'Control samples should be removed before using this function.
#'Control assays (AssayType is not "assay", or Assay contains "control" or "ctrl") should be removed before using this function.
#'The posthoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g.
#'mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable).
#'
Expand Down Expand Up @@ -349,7 +369,7 @@ olink_anova <- function(df,
#'
#' library(dplyr)
#'
#' npx_df <- npx_data1 %>% filter(!grepl('control',SampleID, ignore.case = TRUE))
#' npx_df <- npx_data1 |> filter(!grepl('control|ctrl',SampleID, ignore.case = TRUE))
#'
#' #Two-way ANOVA, one main effect (Site) covariate.
#' #Results in model NPX~Treatment*Time+Site.
Expand All @@ -361,10 +381,10 @@ olink_anova <- function(df,
#' #on the interaction effect Treatment:Time with covariate Site.
#'
#' #Filtering out significant and relevant results.
#' significant_assays <- anova_results %>%
#' filter(Threshold == 'Significant' & term == 'Treatment:Time') %>%
#' select(OlinkID) %>%
#' distinct() %>%
#' significant_assays <- anova_results |>
#' filter(Threshold == 'Significant' & term == 'Treatment:Time') |>
#' select(OlinkID) |>
#' distinct() |>
#' pull()
#'
#' #Posthoc, all pairwise comparisons
Expand Down Expand Up @@ -443,18 +463,39 @@ olink_anova_posthoc <- function(df,
stop("All effect terms must be included in the variable argument or model formula.")
}

# Stop if internal controls (assays) have not been removed
if ("AssayType" %in% names(df)) {
if (any(df$AssayType != "assay")) {
ctrl_assays <- df |>
dplyr::filter(AssayType != "assay")

stop(paste0(
'Control assays have not been removed from the dataset.\n Assays with AssayType != "assay" should be excluded.\n The following ', length(unique(ctrl_assays$Assay)), " control assays were found:\n ",
paste(strwrap(toString(unique(ctrl_assays$Assay)), width = 80), collapse = "\n")
))
}
} else if (any(stringr::str_detect(df$Assay, stringr::regex("control|ctrl", ignore_case = TRUE)))) {
ctrl_assays <- df |>
dplyr::filter(stringr::str_detect(df$Assay, stringr::regex("control|ctrl", ignore_case = TRUE)))

stop(paste0(
'Control assays have not been removed from the dataset.\n Assays with "control" in their Assay field should be excluded.\n The following ', length(unique(ctrl_assays$Assay)), " control assays were found:\n ",
paste(strwrap(toString(unique(ctrl_assays$Assay)), width = 80), collapse = "\n")
))
}


withCallingHandlers({

#Filtering on valid OlinkID
df <- df %>%
df <- df |>
dplyr::filter(stringr::str_detect(OlinkID,
"OID[0-9]{5}"))

if(is.null(olinkid_list)){
olinkid_list <- df %>%
dplyr::select(OlinkID) %>%
dplyr::distinct() %>%
olinkid_list <- df |>
dplyr::select(OlinkID) |>
dplyr::distinct() |>
dplyr::pull()
}

Expand Down Expand Up @@ -547,37 +588,37 @@ olink_anova_posthoc <- function(df,
e_form <- as.formula(paste0("pairwise~", paste(effect,collapse="+")))
}

anova_posthoc_results <- df %>%
dplyr::filter(OlinkID %in% olinkid_list) %>%
dplyr::filter(!(OlinkID %in% npxCheck$all_nas)) %>% #Exclude assays that have all NA:s
dplyr::mutate(OlinkID = factor(OlinkID, levels = olinkid_list)) %>%
dplyr::group_by(Assay, OlinkID, UniProt, Panel) %>%
anova_posthoc_results <- df |>
dplyr::filter(OlinkID %in% olinkid_list) |>
dplyr::filter(!(OlinkID %in% npxCheck$all_nas)) |> #Exclude assays that have all NA:s
dplyr::mutate(OlinkID = factor(OlinkID, levels = olinkid_list)) |>
dplyr::group_by(Assay, OlinkID, UniProt, Panel) |>
dplyr::do(data.frame(emmeans::emmeans(stats::lm(as.formula(formula_string),data=.),
specs=e_form,
cov.reduce = function(x) round(c(mean(x),mean(x)+sd(x)),4),
infer=c(TRUE,TRUE),
adjust=post_hoc_padjust_method)[[c("contrasts","emmeans")[1+as.numeric(mean_return)]]],
stringsAsFactors=FALSE)) %>%
dplyr::ungroup() %>%
dplyr::mutate(term=paste(effect,collapse=":")) %>%
stringsAsFactors=FALSE)) |>
dplyr::ungroup() |>
dplyr::mutate(term=paste(effect,collapse=":")) |>
dplyr::rename(conf.low=lower.CL,
conf.high=upper.CL)

if(mean_return){
anova_posthoc_results <- anova_posthoc_results %>%
anova_posthoc_results <- anova_posthoc_results |>
dplyr::select(all_of(c("Assay", "OlinkID", "UniProt", "Panel", "term",
effect, "emmean", "conf.low", "conf.high")))
} else if(!mean_return){
anova_posthoc_results <- anova_posthoc_results %>%
dplyr::rename(Adjusted_pval = p.value) %>%
dplyr::arrange(Adjusted_pval) %>%
anova_posthoc_results <- anova_posthoc_results |>
dplyr::rename(Adjusted_pval = p.value) |>
dplyr::arrange(Adjusted_pval) |>
dplyr::mutate(Threshold = if_else(Adjusted_pval < 0.05,
'Significant',
'Non-significant')) %>%
'Non-significant')) |>
dplyr::select(tidyselect::any_of(c("Assay", "OlinkID", "UniProt", "Panel", "term", "contrast", effect, "estimate",
"conf.low", "conf.high", "Adjusted_pval","Threshold")))

if(post_hoc_padjust_method=="none") anova_posthoc_results <- anova_posthoc_results %>% rename(pvalue=Adjusted_pval)
if(post_hoc_padjust_method=="none") anova_posthoc_results <- anova_posthoc_results |> rename(pvalue=Adjusted_pval)
}

return(anova_posthoc_results)
Expand All @@ -588,3 +629,20 @@ olink_anova_posthoc <- function(df,
})
}



#' Internal Anova function
#'
#' @param x grouped data frame
#' @param formula_string anova formula
#' @param fact.vars variables in factor form
#'
#' @return anova results
#' @noRd
internal_anova <- function(x, formula_string, fact.vars){
generics::tidy(car::Anova(stats::lm(as.formula(formula_string),
data=x,
contrasts = sapply(fact.vars,function(x) return(contr.sum),
simplify = FALSE)),type=3))
}

Loading
Loading