Skip to content

4. Plot Particle Size Distribution (PSD)

Kelsy Cain edited this page Oct 7, 2024 · 4 revisions

Now that the particle size distribution (PSD) has been generated using PSD.R, you can now begin visualizng the PSD data by generating the ridgeline plots. To do this, open PSD_plotting.R.

Load Data

  1. Load the required packages.
library(tidyverse)
library(FCSplankton)
  1. Set working directory. setwd("PATH/TO/PROJECT/")

  2. Read in PSD data from the .csv file generated previously.

project <- basename(getwd())
PSD_all <- read_csv(paste0("Influx_", project,"_PSD.csv"))
PSD_all[1:3,]
  1. Summarize PSD across sample replicates. Depending on your data, you may need to change the columns you group by.
PSD_all <- PSD_all %>%
  dplyr::select(-sample,-time,-file,-stain,-flag,-replicate,-comments,-volume) %>% # remove columns that would inhibit averaging across replicates
  dplyr::group_by(station,depth,Qc,pop) %>%
  summarize_all(function(x) mean(x, na.rm=T)) %>%
  arrange(station) 
  1. For depth profile samples, create station position labels
PSD_all$lat <- trunc(PSD_all$lat *10^2)/10^2
PSD_all$lon <- trunc(PSD_all$lon*10^2)/10^2
PSD_all$position <- paste0("St. ",PSD_all$station," (",PSD_all$lat,",",PSD_all$lon,")")
PSD_all$position <- as_factor(PSD_all$position)

Visualize particle size distributions

  1. Set population group colors consistent with the gating population color schemes.
group_colors <- c(bacteria = "lightsalmon1",
                  prochloro=viridis::viridis(4)[1],
                  synecho=viridis::viridis(4)[2],
                  picoeuk=viridis::viridis(4)[3])
  1. Generate ridgeline plot using ggridges::geom_density_ridges() of the binned particle size distribution, with biomass (µgC/L) representing the height of each bin. Using interaction() allows comparison of PSDs across populations and across the whole dataset. See https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html for more details regarding ggridges. This is a more quantitive ridgeline plot than plotting individual PSDs for each population without using interaction().
PSD_all %>%
  group_by(position,depth) %>%
  ggplot() +
  ggridges::geom_density_ridges(aes(x = Qc, y = -depth, height = biomass_per_bin, fill =pop, group = interaction(depth,pop)), stat="identity", color="darkgrey", alpha=0.55,size=.15,panel_scaling=FALSE) + # panel_scaling=TRUE relative scaling is calculated separately for each panel. panel_scaling=FALSE, relative scaling is calculated globally
  scale_x_continuous(trans = "log10") +
  scale_fill_manual(name = 'Population', values = group_colors, breaks = c("bacteria",'prochloro','synecho',"picoeuk"), labels = c("Bacteria",'Pro','Syn',"Picoeuk")) +
  theme(legend.key.size = unit(.35, 'cm')) +
  annotation_logticks(sides = "b")  +
  theme_bw() +
  facet_wrap( . ~ position,ncol=4) +
  labs(x="Carbon Content Distribution (pgC)",
       y= "Depth (m)")
ggsave("biomass_distribution.png", path = "./plots",height=10,width=8)
  1. Generate ridgeline plot using ggridges::geom_density_ridges() of the binned particle size distribution like above, except with abundance (cells/µL) representing the height of each bin.
PSD_all %>%
  group_by(position,depth) %>%
  ggplot() +
  ggridges::geom_density_ridges(aes(x = Qc, y = -depth, height = abundance_per_bin, fill =pop, group = interaction(depth,pop)), stat="identity", color="darkgrey", alpha=0.55,size=.15,panel_scaling=FALSE) + # panel_scaling=TRUE relative scaling is calculated separately for each panel. panel_scaling=FALSE, relative scaling is calculated globally
  scale_x_continuous(trans = "log10") +
  scale_fill_manual(name = 'Population', values = group_colors, breaks = c("bacteria",'prochloro','synecho',"picoeuk"), labels = c("Bacteria",'Pro','Syn',"Picoeuk")) +
  theme(legend.key.size = unit(.35, 'cm')) +
  annotation_logticks(sides = "b")  +
  theme_bw() +
  facet_wrap( . ~ position,ncol=4) +
  labs(x="Carbon Content Distribution (pgC)",
       y= "Depth (m)")
ggsave("abundance_distribution.png", path = "./plots",height=10,width=8)
  1. Generate ridgeline plot of individual PSDs for each population, with abundance (cells/µL) representing the height of each bin. The maximum abundance per PSD bin for each population within an individual plot facet is set to a density height of 1, such that the height of ridgelines are not comparable across populations or facets. This allows for easier qualitative comparisons of a single population within each sample across the dataset.
PSD_wide <- PSD_all %>%
  pivot_wider(names_from = pop, values_from = c("abundance_per_bin","biomass_per_bin"),values_fill=0) #issue is with cell_diameter column
PSD_wide[1:3,]

PSD_wide %>%
  group_by(position,depth) %>%
  ggplot() +
  ggridges::geom_density_ridges2(aes(x = Qc, y = -depth, height = abundance_per_bin_bacteria, fill = "bacteria", group = depth), stat="identity",  color="darkgrey", alpha=0.4,size=.25,panel_scaling=TRUE) + # panel_scaling=TRUE relative scaling is calculated separately for each panel. panel_scaling=FALSE, relative scaling is calculated globally
  ggridges::geom_density_ridges2(aes(x = Qc, y = -depth, height = abundance_per_bin_prochloro, fill ="prochloro", group = depth), stat="identity", color="darkgrey", alpha=0.55,size=.25,panel_scaling=TRUE) + # panel_scaling=TRUE relative scaling is calculated separately for each panel. panel_scaling=FALSE, relative scaling is calculated globally
  ggridges::geom_density_ridges2(aes(x = Qc, y = -depth, height = abundance_per_bin_synecho, fill="synecho", group = depth), stat="identity", color="darkgrey", alpha=0.55,size=.25,panel_scaling=TRUE) + # panel_scaling=TRUE relative scaling is calculated separately for each panel. panel_scaling=FALSE, relative scaling is calculated globally
  ggridges::geom_density_ridges2(aes(x = Qc, y = -depth, height = abundance_per_bin_picoeuk, fill='picoeuk', group = depth), stat="identity",  color="darkgrey", alpha=0.55,size=.25,panel_scaling=TRUE) + # panel_scaling=TRUE relative scaling is calculated separately for each panel. panel_scaling=FALSE, relative scaling is calculated globally
    scale_x_continuous(trans = "log10",breaks = c(.01,.1,1,1,10), minor_breaks = c(.005,0.05,.5,5)) +
  scale_fill_manual(name = 'Population', values = group_colors, breaks = c("bacteria",'prochloro','synecho',"picoeuk"), labels = c("Bacteria",'Pro','Syn',"Picoeuk")) +
  theme(legend.key.size = unit(.35, 'cm')) +
  annotation_logticks(sides = "b")  +
  theme_bw() +
  facet_wrap( . ~ position,ncol=4) +
  labs(x="Carbon Content Distribution (pgC)",
       y= "Depth (m)")
ggsave("individual_abundance_distribution.png", path = "./plots",height=10,width=8)

The next step is to curate the PSD into a single excel spreadsheet formatted specifically for Simons CMAP ingestion. Code available here.

Clone this wiki locally