Skip to content

Commit

Permalink
summary and legends for one-way anova. improved summaries for two-sam…
Browse files Browse the repository at this point in the history
…ple and regression
  • Loading branch information
3mmaRand committed Feb 23, 2024
1 parent bad873b commit 2cc4f49
Show file tree
Hide file tree
Showing 23 changed files with 340 additions and 112 deletions.
Binary file modified adipocytes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/ideas_about_data.html
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ <h1 class="title">
<ul>
<li><p>the range: the difference between the maximum value and the minimum value in a variable</p></li>
<li><p>the interquartile range: two values, the first quartile and the thrid quartile. The first quartile is half way between the median value and the lowest value when the values are arranged in order and the third quartile is halfway between the median value and the highest value</p></li>
<li><p>the variance: the average of the squared differences between each value and the variable’s mean, <span class="math inline">\(\bar{x} = \frac{(\sum{x - \bar{x})^2}}{n - 1}\)</span></p></li>
<li><p>the variance: the average of the squared differences between each value and the variable’s mean, <span class="math inline">\(s^2 = \frac{(\sum{x - \bar{x})^2}}{n - 1}\)</span></p></li>
<li><p>the standard deviation: the square root of the variance.</p></li>
</ul></section><section id="discrete-data" class="level2" data-number="5.4"><h2 data-number="5.4" class="anchored" data-anchor-id="discrete-data">
<span class="header-section-number">5.4</span> Discrete data</h2>
Expand Down
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-13-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-14-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-15-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-16-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-17-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/import_to_report_files/figure-html/unnamed-chunk-18-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ <h1 class="title">Computational Analysis for Bioscientists</h1>
<div>
<div class="quarto-title-meta-heading">Published</div>
<div class="quarto-title-meta-contents">
<p class="date">22 February, 2024</p>
<p class="date">23 February, 2024</p>
</div>
</div>

Expand Down
65 changes: 44 additions & 21 deletions docs/one_way_anova_and_kw.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions docs/search.json

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions docs/single_linear_regression.html
Original file line number Diff line number Diff line change
Expand Up @@ -648,6 +648,16 @@ <h1 class="title">
</div>
</section></section><section id="summary" class="level2" data-number="12.4"><h2 data-number="12.4" class="anchored" data-anchor-id="summary">
<span class="header-section-number">12.4</span> Summary</h2>
<ol type="1">
<li><p>Single linear regression is an appropriate when you have one continuous explanatory variable and one continuous response and the relationship between the two is linear.</p></li>
<li><p>Applying a single linear regression to data means putting a line of best fit through it. We estimate the <strong>coefficients</strong> (also called the <strong>parameters</strong>) of the model. These are the intercept, <span class="math inline">\(\beta_0\)</span>, and the slope, <span class="math inline">\(\beta_1\)</span>. We test whether the parameters differ significantly from zero</p></li>
<li><p>We can use <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> to a linear regression.</p></li>
<li><p>In the output of <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> the coefficients are listed in a table in the Estimates column. The <em>p</em>-value for each coefficient is in the test of whether it differs from zero. At the bottom of the output there is a test of the model <em>overall</em>. In a single linear regression this is exactly the same as the test of the <span class="math inline">\(\beta_1\)</span> and the p-values are identical. The R-squared value is the proportion of the variance in the response variable that is explained by the model.</p></li>
<li><p>The assumptions of the general linear model are that the residuals are normally distributed and have homogeneity of variance. A residual is the difference between the predicted value and the observed value.</p></li>
<li><p>We examine a histogram of the residuals and use the Shapiro-Wilk normality test to check the normality assumption. We check the variance of the residuals is the same for all fitted values with a residuals vs fitted plot.</p></li>
<li><p>If the assumptions are not met, we might need to transform the data or use a different type of model.</p></li>
<li><p>When reporting the results of a regression we give the significance, direction and size of the effect. Often we give the equation of the best fitting line. A Figure should show the data and the line of best fit.</p></li>
</ol>


<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list" style="display: none">
Expand Down
8 changes: 4 additions & 4 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
</url>
<url>
<loc>https://3mmarand.github.io/comp4biosci/ideas_about_data.html</loc>
<lastmod>2023-10-17T14:17:55.511Z</lastmod>
<lastmod>2024-02-23T11:50:54.420Z</lastmod>
</url>
<url>
<loc>https://3mmarand.github.io/comp4biosci/first_steps_rstudio.html</loc>
Expand Down Expand Up @@ -62,15 +62,15 @@
</url>
<url>
<loc>https://3mmarand.github.io/comp4biosci/single_linear_regression.html</loc>
<lastmod>2024-02-22T13:48:27.629Z</lastmod>
<lastmod>2024-02-23T10:52:10.379Z</lastmod>
</url>
<url>
<loc>https://3mmarand.github.io/comp4biosci/two_sample_tests.html</loc>
<lastmod>2024-02-22T13:57:51.528Z</lastmod>
<lastmod>2024-02-23T10:52:20.092Z</lastmod>
</url>
<url>
<loc>https://3mmarand.github.io/comp4biosci/one_way_anova_and_kw.html</loc>
<lastmod>2024-02-18T11:41:34.000Z</lastmod>
<lastmod>2024-02-23T14:05:48.867Z</lastmod>
</url>
<url>
<loc>https://3mmarand.github.io/comp4biosci/two_way_anova.html</loc>
Expand Down
10 changes: 6 additions & 4 deletions docs/two_sample_tests.html
Original file line number Diff line number Diff line change
Expand Up @@ -1108,14 +1108,16 @@ <h1 class="title">
<span class="header-section-number">13.7</span> Summary</h2>
<ol type="1">
<li><p>A linear model with one explanatory variable with two groups and one continuous response is “a two-sample test”.</p></li>
<li><p>If pairs of observations in the groups have something in common that make them more similar to each other, than to other observations, then those observations are not independent</p></li>
<li><p>A paired-samples test is used when the observations are not independent.</p></li>
<li><p>If pairs of observations in the groups have something in common that make them more similar to each other, than to other observations, then those observations are not independent. A <strong>paired-samples test</strong> is used when the observations are not independent.</p></li>
<li><p>A linear model with one explanatory variable with two groups and one continuous response is also known as a <strong>two-sample <em>t</em>-test</strong> when the samples are independent and as a <strong>paired-samples <em>t</em>-test</strong> when they are not</p></li>
<li><p>We can use <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> to do two-sample and paired sample tests. We can also use <code><a href="https://rdrr.io/r/stats/t.test.html">t.test()</a></code> for these but using <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> helps us understand tests with more groups and/or more variables where we will have to use <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code>. The output of <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> is also more typical of the output of statistical functions in R.</p></li>
<li><p>We estimate the <strong>coefficients</strong> (also called the <strong>parameters</strong>) of the model. For a two-sample test these are the mean of the first group, <span class="math inline">\(\beta_0\)</span> (which might also be called the intercept) and the difference between the means of the first and second groups, <span class="math inline">\(\beta_1\)</span> (which might also be called the slope). For a paired-sample test there is just one parameter, the mean difference between pairs of values, <span class="math inline">\(\beta_0\)</span> (which might also be called the intercept). We test whether the parameters differ significantly from zero</p></li>
<li><p>We can use <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> to a linear regression.</p></li>
<li><p>In the output of <code><a href="https://rdrr.io/r/stats/lm.html">lm()</a></code> the coefficients are listed in a table in the Estimates column. The <em>p</em>-value for each coefficient is in the test of whether it differs from zero. At the bottom of the output there is a test of the model <em>overall</em>. In this case, this is exactly the same as the test of the <span class="math inline">\(\beta_1\)</span> and the p-values are identical. The R-squared value is the proportion of the variance in the response variable that is explained by the model.</p></li>
<li><p>The assumptions of the general linear model are that the residuals are normally distributed and have homogeneity of variance. A residual is the difference between the predicted value and the observed value.</p></li>
<li><p>We examine a histogram of the residuals and use the Shapiro-Wilk normality tests to check the normality assumption. We check the variance of the residuals is the same for all fitted values with a residuals vs fitted plot.</p></li>
<li><p>We examine a histogram of the residuals and use the Shapiro-Wilk normality test to check the normality assumption. We check the variance of the residuals is the same for all fitted values with a residuals vs fitted plot.</p></li>
<li><p>If the assumptions are not met, we can use alternatives known as non-parametric tests. These are applied with <code><a href="https://rdrr.io/r/stats/wilcox.test.html">wilcox.test()</a></code> in R.</p></li>
<li><p>When reporting the results of a test we give the significance, direction and size of the effect. Our figures and the values we give should reflect the type of test we have used. We use means and standard errors for parametric tests and medians and interquartile ranges for non-parametric tests. We also give the test statistic, the degrees of freedom (parametric) or sample size (non-parametric) and the p-value.</p></li>
<li><p>When reporting the results of a test we give the significance, direction and size of the effect. Our figures and the values we give should reflect the type of test we have used. We use means and standard errors for parametric tests and medians and interquartile ranges for non-parametric tests. We also give the test statistic, the degrees of freedom (parametric) or sample size (non-parametric) and the p-value. We annotate our figures with the p-value, making clear which comparison it applies to.</p></li>
</ol>


Expand Down
Binary file modified docs/two_sample_tests_files/figure-html/unnamed-chunk-13-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/two_way_anova_files/figure-html/fig-para-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/workflow_rstudio.html
Original file line number Diff line number Diff line change
Expand Up @@ -468,8 +468,8 @@ <h1 class="title">
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="co"># apply a log-square root transformation</span></span>
<span><span class="va">tnums</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/Log.html">log</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/MathFun.html">sqrt</a></span><span class="op">(</span><span class="va">nums</span><span class="op">)</span><span class="op">)</span></span>
<span><span class="va">tnums</span></span>
<span><span class="co">## [1] 2.087194 2.297560 1.386294 0.000000 1.242453 2.124248 1.151293 2.055437</span></span>
<span><span class="co">## [9] 2.109754 1.748254</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span><span class="co">## [1] 1.7917595 0.5493061 1.4166067 2.0715674 1.5222612 0.9729551 2.2213256</span></span>
<span><span class="co">## [8] 2.2924837 2.2154084 1.6836479</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>The first function to be applied is innermost. When we are using just two functions, the level of nesting does not cause too much difficulty in reading the code. However, you can image this gets more unreadable as the number of functions applied increases. It also makes it harder to debug and find out where an error might be. One solution is to create intermediate variables so the commands a given in order:</p>
<div class="cell">
Expand Down
110 changes: 97 additions & 13 deletions one_way_anova_and_kw.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#| echo: false
source("_common.R")
status("polishing")
status("complete")
```

## Overview
Expand Down Expand Up @@ -134,14 +134,15 @@ Import the data:
culture <- read_csv("data-raw/culture.csv")
```


```{r}
#| echo: false
knitr::kable(culture) |>
kableExtra::kable_styling() |>
kableExtra::scroll_box(height = "200px")
```


The Response variable is colony diameters in millimetres and we would
expect it to be continuous. The Explanatory variable is type of media
and is categorical with 3 groups. It is known “one-way ANOVA” or
Expand Down Expand Up @@ -420,19 +421,20 @@ and homogeneity of variance are probably not violated.

### Report

There is a significant effect of media on the diameter of bacterial
colonies (*F* = 6.11; *d.f.* = 2, 27; *p* = 0.006) with colonies growing
significantly better when both sugar and amino acids are added to the
medium. Post-hoc testing with Tukey's Honestly Significant Difference
test [@tukey1949] revealed the colony diameters were significantly
larger when grown with both sugar and amino acids
($\bar{x} \pm s.e$: 11.4 $\pm$ 0.37 mm) than with neither
There was a significant effect of media on the diameter of bacterial
colonies (*F* = 6.11; *d.f.* = 2, 27; *p* = 0.006). Post-hoc testing
with Tukey's Honestly Significant Difference test [@tukey1949] revealed
the colony diameters were significantly larger when grown with both
sugar and amino acids ($\bar{x} \pm s.e$: 11.4 $\pm$ 0.37 mm) than with
neither
(10.2 $\pm$ 0.26 mm; *p* = 0.0092) or just sugar (10.1 $\pm$ 0.23 mm;
*p* = 0.0244). See @fig-culture.



::: {#fig-culture}
```{r}
#| label: fig-culture
#| fig-cap: "Diameters of bacterial colonies grown on three types of media: control, with sugar added and with both sugar and amino acids added. Errors bars are ± 1 s.e."
#| code-fold: true
ggplot() +
geom_point(data = culture, aes(x = medium, y = diameter),
Expand Down Expand Up @@ -462,6 +464,18 @@ ggplot() +
theme_classic()
```

**Medium affects bacterial colony diameter**. Ten replicate colonies
were grown on three types of media: control, with sugar added and with
both sugar and amino acids added. Error bars are means $\pm$ 1 standard
error. There was a significant effect of media on the diameter of
bacterial colonies (*F* = 6.11; *d.f.* = 2, 27; *p* = 0.006). Post-hoc
testing with Tukey's Honestly Significant Difference test [@tukey1949]
revealed the colony diameters were significantly larger when grown with
both sugar and amino acids than with neither or just sugar. Data
analysis was conducted in R [@R-core] with tidyverse packages [@tidyverse].

:::

# Kruskal-Wallis

Our examination of the assumptions revealed a possible violation of the
Expand Down Expand Up @@ -552,9 +566,10 @@ grown with both sugar and amino acids ($median = 11.3 mm$) than with
neither ($median = 10.2 mm$; *p* = 0.031) or just sugar
($median = 10.2 mm$; *p* = 0.038). See @fig-culture-kw.


::: {#fig-culture-kw}
```{r}
#| label: fig-culture-kw
#| fig-cap: "Diameters of bacterial colonies grown on three types of media: control, with sugar added and with both sugar and amino acids added. Heavy line indicate the median, boxes the interquartile range and whiskers the range."
#| code-fold: true
ggplot(data = culture, aes(x = medium, y = diameter)) +
geom_boxplot() +
Expand All @@ -575,3 +590,72 @@ ggplot(data = culture, aes(x = medium, y = diameter)) +
label = expression(italic(p)~"= 0.031")) +
theme_classic()
```


**Medium affects bacterial colony diameter**. Ten replicate colonies
were grown on three types of media: control, with sugar added and with
both sugar and amino acids added. The heavy lines
indicate median diameter, boxes indicate the interquartile range
and whiskers the range. There was a significant effect of media on the
diameter of bacterial colonies (Kruskal-Wallis: *chi-squared* = 6.34,
*df* = 2, *p*-value = 0.042). Post-hoc testing with the Dunn test
[@dunn1964] revealed the colony diameters were significantly larger when
grown with both sugar and amino acids than with neither or just sugar.
Data analysis was conducted in R [@R-core] with
tidyverse packages [@tidyverse].

:::

# Summary

1. A linear model with one explanatory variable with two or more groups
is also known as a **one-way ANOVA**.

2. We estimate the **coefficients** (also called the **parameters**) of
the model. For a one-way ANOVA with three groups these are the mean
of the first group, $\beta_0$, the difference between the means of
the first and second groups, $\beta_1$, and the difference between
the means of the first and third groups, $\beta_2$. We test whether the
parameters differ significantly from zero

3. We can use `lm()` to one-way ANOVA in R.

4. When we get a significant effect of our explanatory variable, it only
tells us that at least two of the means differ. To find out which
means differ, we need a **post-hoc** test. Here we use Tukey’s HSD
applied with the `emmeans()` and `pairs()` functions from the
**`emmeans`** package. Post-hoc tests make adjustments to the
*p*-values to account for the fact that we are doing multiple tests.

5. In the output of `lm()` the coefficients are listed in a table in the
Estimates column. The *p*-value for each coefficient is in the test
of whether it differs from zero. At the bottom of the output there
is a test of the model *overall*. Now we have more than two
parameters, this is different from the test on any one parameter. The
R-squared value is the proportion of the variance in the response
variable that is explained by the model. It tells us is the
explanatory variable is useful in predicting the response variable
overall.

6. The assumptions of the general linear model are that the residuals
are normally distributed and have homogeneity of variance. A residual
is the difference between the predicted value and the observed value.

7. We examine a histogram of the residuals and use the Shapiro-Wilk
normality test to check the normality assumption. We check the
variance of the residuals is the same for all fitted values with
a residuals vs fitted plot.

8. If the assumptions are not met, we can use the Kruskal-Wallis test
applied with `kruskal.test()` in R and follow it with The Dunn test
applied with `dunnTest()` in the package **`FSA`**.

9. When reporting the results of a test we give the significance,
direction and size of the effect. Our figures and the values we give
should reflect the type of test we have used. We use means and
standard errors for parametric tests and medians and interquartile
ranges for non-parametric tests. We also give the test statistic, the
degrees of freedom (parametric) or sample size (non-parametric) and
the p-value. We annotate our figures with the p-value, making clear
which comparison it applies to.

Loading

0 comments on commit 2cc4f49

Please sign in to comment.