- FlowSoFine
The goal of FlowSoFine is to quickly and easily generate data structures that make it easy to get statistical output starting from raw fcs-files and a metadata table. The grid approach is especially useful if there aren’t any populations clearly visible in the files. This works with a flexible number of channels and correction for spatial dependencies.
You can install the development version of FlowSoFine from GitHub with:
install.packages("remotes")
remotes::install_github("JonasKup/FlowSoFine")
A graphical user interface for the method is available here: https://github.com/JonasKup/FlowSoFineApp
First read the fcs-files and the metadata table.
library(flowCore)
metadata <- read.csv2("metadata.csv")
fcs <- read.flowSet(pattern = ".fcs")
fcs
## A flowSet with 28 experiments.
##
## column names(13): Time FSC-A ... FITC-A FITC-H
Check the available channels:
colnames(fcs)
## [1] "Time" "FSC-A" "FSC-H" "SSC-A" "SSC-H" "PE-A" "PE-H" "APC-A"
## [9] "APC-H" "DAPI-A" "DAPI-H" "FITC-A" "FITC-H"
We can quickly plot samples by using the plotFF-function:
library(FlowSoFine)
plotFF(flowFrame = fcs[[1]],
channels = c("SSC-H", "DAPI-H")) #plot the first sample
## Warning in FSFTemplate(flowSet(flowFrame), channels, resolution = resolution, :
## log transformations might shift events on the border because of events with an
## expression of 0. Consider using asinh instead.
Next create the FSFTemplate. Fine tuning of the FSFTemplate function is explained in the “FSFTemplate”-chapter.
template <- FSFTemplate(fcs, c("SSC-H", "DAPI-H"))
## Transforming and subsetting flowSet
## Warning in FSFTemplate(fcs, c("SSC-H", "DAPI-H")): log transformations might
## shift events on the border because of events with an expression of 0. Consider
## using asinh instead.
## Building coordinates for FSFtemplate with 2 dimensions and a resolution of 15
## Processing sample 1 / 28 Processing sample 2 / 28 Processing sample 3 / 28 Processing sample 4 / 28 Processing sample 5 / 28 Processing sample 6 / 28 Processing sample 7 / 28 Processing sample 8 / 28 Processing sample 9 / 28 Processing sample 10 / 28 Processing sample 11 / 28 Processing sample 12 / 28 Processing sample 13 / 28 Processing sample 14 / 28 Processing sample 15 / 28 Processing sample 16 / 28 Processing sample 17 / 28 Processing sample 18 / 28 Processing sample 19 / 28 Processing sample 20 / 28 Processing sample 21 / 28 Processing sample 22 / 28 Processing sample 23 / 28 Processing sample 24 / 28 Processing sample 25 / 28 Processing sample 26 / 28 Processing sample 27 / 28 Processing sample 28 / 28
We can create a distance matrix directly from the FSFTemplate using the weightedBray function. It automatically adjusts for spatial dependencies in the grid by assigning weights to every combination of bins based on their distance from another. Further Details and how to create a distance matrix without adjustments are explained in the “Weights and Distance Matrices”-chapter.
distM <- weightedBray(template)
To visualize the difference between the samples we can use a nmds-plot. For this we can use the metaMDS function that is included in the vegan package for community ecology.
library(vegan)
nmd <- metaMDS(distM, trace = FALSE)
plot(nmd, display = "sites")
ordispider(nmd,
groups = metadata$treatment)
ordiellipse(nmd,
groups = metadata$treatment,
draw = "polygon",
col = c("darkblue", "white", "black"),
label = TRUE)
The function adonis2 from the vegan package makes it possible to perform an analysis of variance on a distance matrix to test for significant effects in treatment groups.
First check for homogeneity in group variances:
bd <- betadisper(distM, metadata$treatment)
boxplot(bd)
permutest(bd, pairwise = T)
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 2 0.002215 0.0011075 0.3654 999 0.738
## Residuals 25 0.075777 0.0030311
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## control Treatment 1 Treatment 2
## control 0.98800 0.457
## Treatment 1 0.98257 0.508
## Treatment 2 0.40472 0.45512
Then check for a effect using vegan’s PERMANOVA implementation.
adonis2(distM ~ treatment, data = metadata)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = distM ~ treatment, data = metadata)
## Df SumOfSqs R2 F Pr(>F)
## treatment 2 0.17819 0.12727 1.8229 0.002 **
## Residual 25 1.22188 0.87273
## Total 27 1.40008 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The FlowSoFine package contains a function for performing a basic pairwise PERMANOVA.
pw.adonis2(distM, term = "treatment", data = metadata, adjust = "BH")
## ======================
## Adjusted p-values ( BH )
## -----------------
## control Treatment 1 : 0.0105
## -----------------
## control Treatment 2 : 0.141
## -----------------
## Treatment 1 Treatment 2 : 0.009
It might be interesting to know if the significant effect stems from a certain region within the SSC-H - DAPI-H area. For this we calculate t-scores based on Welch’s t-statistic independently for every bin and for every combination of levels in a given metadata group and use them as a measurement of difference.
The tscores function does this automatically. It returns a data.frame of t-scores with the columns representing every possible unique combination of levels in the given metadata group.
For visualization purposes we create a higher resolution FSFTemplate first.
template50 <- FSFTemplate(fcs, c("SSC-H", "DAPI-H"), resolution = 50)
## Transforming and subsetting flowSet
## Warning in FSFTemplate(fcs, c("SSC-H", "DAPI-H"), resolution = 50): log
## transformations might shift events on the border because of events with an
## expression of 0. Consider using asinh instead.
## Building coordinates for FSFtemplate with 2 dimensions and a resolution of 50
## Processing sample 1 / 28 Processing sample 2 / 28 Processing sample 3 / 28 Processing sample 4 / 28 Processing sample 5 / 28 Processing sample 6 / 28 Processing sample 7 / 28 Processing sample 8 / 28 Processing sample 9 / 28 Processing sample 10 / 28 Processing sample 11 / 28 Processing sample 12 / 28 Processing sample 13 / 28 Processing sample 14 / 28 Processing sample 15 / 28 Processing sample 16 / 28 Processing sample 17 / 28 Processing sample 18 / 28 Processing sample 19 / 28 Processing sample 20 / 28 Processing sample 21 / 28 Processing sample 22 / 28 Processing sample 23 / 28 Processing sample 24 / 28 Processing sample 25 / 28 Processing sample 26 / 28 Processing sample 27 / 28 Processing sample 28 / 28
ts <- tscores(template50, metadata$treatment)
head(ts)
## Treatment 1-control Treatment 2-control Treatment 2-Treatment 1
## [1,] -1.776647 0.11944332 2.925082
## [2,] -1.897241 0.05337584 2.826764
## [3,] -2.090760 -0.18363626 2.911060
## [4,] -2.329530 -0.19959967 2.846817
## [5,] -2.676013 0.19368096 3.076102
## [6,] -3.093052 0.27998864 3.052472
We can visualize the differences between two levels of a metadata group using the plotTscores function. .
plotTscores(template50, ts[, "Treatment 1-control", drop = F])
There are some adjustments we can make to the FSFTemplate function. We can change the resolution of the grid by using the resolution argument. Default is 15.
Furthermore we can give a function to transform the data to the transformation argument. If no function is given the data will be log10 transformed. If the data should not be transformed at all then transformation = NULL will do the trick.
The following creates the same FSFTemplate that was created in the “Quick Start”-chapter.
template <- FSFTemplate(flowset = fcs,
channels = c("SSC-H", "DAPI-H"),
resolution = 15,
transformation = log10)
The plot method for the FSFTemplate class and the plotTscores function return ggplot objects and are therefore highly customizable.
We can plot the empty grid:
plot(template)
Or fill it with the event percentages of sample no. 5
plot(template, sample = 5)
If additional layers should be added to the plot we first have to load the ggplot2 library.
Add the bin number to each bin
library(ggplot2)
plot(template, sample = 1) +
geom_text(aes(label = 1:template@nBins), size = 3)
Plot the number of events in every bin for sample no. 5
plot(template, sample = 5) +
geom_text(aes(label = template@counts[,5]), size = 2)
First turn the absolute event numbers contained in the FSFTemplate into frequencies. We can do that using the frequencies function. Next we can create a distance matrix using the vegdist function from the vegan package in conjunction with the frequency table.
freq <- frequencies(template)
library(vegan)
distM_vegan <- vegdist(freq, method = "bray")
The default way to create a distance matrix from a FSFTemplate is the weightedBray function which expands on the Bray-distance by adding in a weight matrix that contains a weight for every combination of bins We can give the function a custom weight matrix but the default weight matrix is calculated with exp(-1*gamma*euclDist) where gamma is a factor controlling the speed of the decline and euclDist is a matrix containing the euclidean distances between the bin’s centers. We can control the gamma factor directly with the gamma argument of the weightedBray function. Default is 8. Lower values increase the region around any bin that is taken into account when calculating the distance matrix. Higher values make the function behave closer to the default Bray-distance.
distM8 <- weightedBray(template, gamma = 8)
distM2 <- weightedBray(template, gamma = 2) #bigger region of spatial dependecy
If the default exponential approach is not what we want we can use the weightMatrix function and give it to the w argument of the weightedBray function. For the weightMatrix function we can choose between an exponential approach or a discrete approach by calling the function with either “exp” or “disc” as the method argument. If no custom weight matrix is given then weightedBray uses the exponential approach of the weightMatrix function by default.
The following creates the same distance matrix as in the “Quick Start”-chapter:
wM_exp <- weightMatrix(template, method = "exp", gamma = 8)
distM <- weightedBray(template, w = wM_exp)
The discrete approach gives us more control over the final matrix by letting us give a weight for every ranked distance between bins to the val argument of the weightMatrix function. However this approach depends on the resolution of the template’s grid.
wM_disc <- weightMatrix(template, method = "disc", val = c(.5,.25,.25))
We can visualize the different weight matrices by plotting the empty grid and giving a column of a weight matrix to the fill aesthetic of the plot function.
In this case we plot the weights of every bin relating to bin no. 82 as contained in the default exponential weight matrix.
wM <- weightMatrix(template, method = "exp", gamma = 8)
plot(template, mapping = aes(fill = wM[,82])) +
geom_text(aes(label = 1:template@nBins), size = 3)
The approach shown here is not limited to two dimensional data. Instead it’s possible to easily add channels when creating a FSFTemplate.
The following creates a template with four channels.
template4d <- FSFTemplate(fcs, c("SSC-H", "DAPI-H", "APC-H", "FITC-H"), resolution = 15)
## Transforming and subsetting flowSet
## Warning in FSFTemplate(fcs, c("SSC-H", "DAPI-H", "APC-H", "FITC-H"), resolution
## = 15): log transformations might shift events on the border because of events
## with an expression of 0. Consider using asinh instead.
## Building coordinates for FSFtemplate with 4 dimensions and a resolution of 15
## Processing sample 1 / 28 Processing sample 2 / 28 Processing sample 3 / 28 Processing sample 4 / 28 Processing sample 5 / 28 Processing sample 6 / 28 Processing sample 7 / 28 Processing sample 8 / 28 Processing sample 9 / 28 Processing sample 10 / 28 Processing sample 11 / 28 Processing sample 12 / 28 Processing sample 13 / 28 Processing sample 14 / 28 Processing sample 15 / 28 Processing sample 16 / 28 Processing sample 17 / 28 Processing sample 18 / 28 Processing sample 19 / 28 Processing sample 20 / 28 Processing sample 21 / 28 Processing sample 22 / 28 Processing sample 23 / 28 Processing sample 24 / 28 Processing sample 25 / 28 Processing sample 26 / 28 Processing sample 27 / 28 Processing sample 28 / 28
It is possible to quickly create a lower dimension FSFTemplate from a higher dimension template using the shrink-function.
template2d <- shrink(template4d, c("SSC-H", "DAPI-H"))
plot(template2d, sample = 1)
It’s possible to project the mean coordinates of a third channel’s bins onto a two dimensional grid using the colormap function.
colormap(shrink(template4d, c("SSC-H", "DAPI-H", "APC-H")), sample = 1, z = "APC-H")
It is also possible to plot a single channel.
plot(shrink(template4d, c("SSC-H")), sample = 1)
## Warning: Removed 1 rows containing missing values (geom_bar).
With the singleChannel.adonis2-function it is possible to automatically check which channels detect an effect in the data given a higher dimensional template. Keep in mind that a higher dimensional structure might still detect significant effects in the given metadata variable even if the individual channels do not. However, this is approach can show which channel on its own can detect effects in the data.
singleChannel.adonis2(template4d, term = "treatment", data = metadata, adjust = "BH")