-
Notifications
You must be signed in to change notification settings - Fork 18
/
practical.Rmd
706 lines (532 loc) · 28.3 KB
/
practical.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
---
title: "Practical"
subtitle: "Introduction to Statistical Analysis Training Course"
output:
html_document:
css: stylesheet.css
toc: yes
toc_depth: 3
toc_float: yes
pdf_document:
toc: yes
toc_depth: 3
---
<!--- rmarkdown::render("/Volumes/Files/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroductionToStats/practical.Rmd") --->
<img src = "crukci_logo.png" style = "display: block; position: absolute; top: 0; right: 0; padding: 10px;" width = 300/>
```{r setup, include = FALSE}
knitr::opts_chunk$set(include = FALSE, comment = NA, fig.width = 6, fig.height = 4)
library(tidyverse)
```
# Introduction
In this practical, we will use several 'real life' data sets to demonstrate some
of the concepts you have seen in the lectures. We will guide you through how to
analyse these data sets and the kinds of questions you should be asking yourself
when faced with similar data.
To explore the data sets and answer the questions in this practical we will be
using web applications we have developed in **[R](https://cran.r-project.org)**
using the **[Shiny](http://shiny.rstudio.com/gallery)** framework. **R** is a
freely available statistical programming language that is popular within
academic communities and commercial organizations. The functionality offered by
R compares favourably with other statistical packages but R has a steep learning
curve and involves writing code rather than use of a point-and-click user
interface. We have decided for this course to focus on the statistical concepts
rather than any particular statistical package. The online apps will allow you
to carry out the statistical analyses without needing to learn R.
## Shiny web applications
We will be using the following [Shiny web applications](https://www.cruk.cam.ac.uk/core-facilities/bioinformatics-core/software/statistics-web-applications)
for the exercises in this practical.
* https://bioinformatics.cruk.cam.ac.uk/stats/central-limit-theorem
* https://bioinformatics.cruk.cam.ac.uk/stats/shinystats
## Data sets
The data sets we will be using are pre-loaded within the Shiny apps but are also available for download using this
**[link](https://rawgit.com/bioinformatics-core-shared-training/IntroductionToStats/master/datasets.zip)**
in case you wish to do the exercises using your favourite statistics package.
------
# Central limit theorem
<span style="color:rgb(235, 7, 142)">**Part (i):**</span>
Using the 2 first tabs of the **[central limit theorem](https://bioinformatics.cruk.cam.ac.uk/stats/central-limit-theorem)**
app, answer the following questions:
1. Empirically check that the mean of normally distributed variables is itself normally distributed for all sample sizes. To do so, simulate **1000 samples** of **size n = 1000** of **Gaussian** IQ random variates
(with mean 100 and standard deviation 15) and compare the estimated mean's density (second tab)
to the one obatined when considering **1000 samples** of **size n = 5**.
2. Now consider Beta distributed random variates. First, simulate **1000 samples** of **size n = 20** of **Beta** random variates with mean 0.5 and shape parameter 0.1 and compare the estimated mean's density (second tab)
to the one obatined when considering with mean 0.1 and shape parameter 0.5. Why is it so different?
<!--
3. What is the probability that 0 confidence intervals out of 50 contain the
**true mean** if data are normally distributed? (too complex)
-->
<span style="color:rgb(235, 7, 142)">**Part (ii):**</span>
Using the **[central limit theorem](https://bioinformatics.cruk.cam.ac.uk/stats/central-limit-theorem)**
app, answer the following questions:
1. Simulate **1000 samples** of **size n = 10** of **Poisson** random variates,
first assuming a **mean of 0.25**, and then assuming a **mean of 100**. Compare
the coverage level of Student's confidence intervals for the mean of these two
simulation sets: How do you explain that the latter is better than the first
one?
2. Now consider **zero-inflated Poisson** variates with a **mean of 30** and a
**10% probability of belonging to the clump-at-zero**. Can you think of a random
variable having such a distribution? How large should the sample size be for the
Student's confidence intervals to have good properties?
3. A student lost a few points in the statistic exams as the use of Student's
confidence intervals for the probability of success of a **Bernoulli** variable
with **pi = 40%** and a **n=100** was not considered as suitable. Should they
contact the University to dispute their mark?
------
# One sample tests
Use the **[statistical tests app](http://bioinformatics.cruk.cam.ac.uk/stats/shinystats)**
to perform one-sample location tests.
For each exercise, select the corresponding numbered data set, e.g. '1.1 Effect
of disease X on height', in the drop-down list on the *Data input* tab page.
The data will be shown in a table on the same tab page for you to familiarize
yourself with. Then select the *One sample test* tab page and set the
hypothesized mean.
You can visualize the data in the *Plots* tab, check the assumptions for a
parametric test on the *Assumptions* tab, and carry out the test you decide is
most appropriate on the *Statistical tests* tab. The *Summary statistics* tab
contains a number of useful descriptive statistics about the data, including
measures of the central tendency (mean, median) and the variability in the data.
## 1.1 Effect of disease on height
A scientist knows that the mean height of females in England is 165cm and
wants to know whether her patients with a certain disease "X" have heights that
differ significantly from the population mean - we will use a one-sample t-test
to test this.
Click on the *Data input* tab from the navigation bar at the top of the page.
Then select the disease X data set from the drop-down list. Look at the values
for height for the patients in this data set in the table shown below.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What are the null and alternative hypotheses?
```{r echo=FALSE}
cat("Null hypothesis: The mean height of female patients with disease X = 165cm
(the population mean for females)
Alternative hypothesis: The mean height of female patients with disease X != 165cm
")
```
Click on the *One sample test* tab in the navigation bar and select the *Plots*
tab to view a box plot and histogram of the patient heights.
There are options for changing the display of these plots in the side panel on
the left. For example, you can overlay points or a density plot on the box plot
or use a different number of bins for the histogram and overlay a normal
distribution.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
Do the data look normally distributed?
Based on the plots, is the parametric, one-sample t–test appropriate?
A *Q-Q plot* can also be helpful for assessing the normality of the data and can
be viewed on the *Assumptions* tab.
```{r}
data <- read_csv("Practical/diseaseX.csv", show_col_types = FALSE)
ggplot(data, aes(Height)) +
geom_histogram(aes(y = ..density..), binwidth = 1, colour = "black", fill = "white") +
stat_function(fun = dnorm, color = "red", args = list(mean = mean(data$Height), sd = sd(data$Height)))
```
```{r echo = FALSE}
cat("In this case the data look normally distributed. Therefore, the one-sample t-test is appropriate.")
```
We are interested in knowing whether the mean height in our sample of patients
with disease X is different from that of the general population. Perform a
**one-sample t-test** on the *Statistical test* tab.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
What is the mean height in your sample?
What is your value of t?
What is the p-value?
How do you interpret the p-value?
```{r}
t.test(data$Height, mu = 165, alternative = "two.sided")
```
```{r echo = FALSE}
cat("Mean height in sample = 171.1cm (95% CI: 169.3 - 172.9)
t = 7.23,
df = 19,
p <= 0.0001.
Under the null hypothesis, the probability of observing a t-statistic as extreme
as 7.23 is very small P(t <= 7.23 | t >= 7.23) < 0.0001).
Therefore, there is strong evidence to reject the null hypothesis in favour of
the alternative hypothesis.
There is strong evidence to suggest that the mean height in female patients with
disease X is different to the population mean height of females of 165cm.
")
```
## 1.2 Blood vessel formation
In blood plasma cancer, there is an increase in blood vessel formation in the
bone marrow. A stem cell transplant can be used as a treatment for blood plasma
cancer. The *Difference* column in this data set reports the differences in
bone marrow microvessel density after treatment for 7 patients.
We are interested in seeing whether there is a decrease in the bone marrow
microvessel density after treatment with a stem cell transplant.
Select the blood vessel formation data set (1.2) in the app on the *Data input*
page.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What are the null and alternative hypotheses?
```{r echo = FALSE}
cat("Null hypothesis: the bone marrow microvessel density after treatment is
greater than or equal to the bone marrow microvessel density before treatment.
Alternative hypothesis: the bone marrow microvessel density after treatment is
less than the bone marrow microvessel density before treatment.
")
```
View the histogram and box plot of the paired differences on the *Plots* tab
in the 'One sample test* page.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
Do the differences look normally distributed?
Is the parametric t–test appropriate?
```{r}
data <- read_csv("Practical/bloodplasmacancer1.csv", show_col_types = FALSE)
ggplot(data, aes(Difference)) +
geom_histogram(aes(y = ..density..), binwidth = 50, colour = "black", fill = "white") +
stat_function(fun = dnorm, color = "red", args = list(mean = mean(data$Difference), sd = sd(data$Difference)))
```
```{r echo = FALSE}
cat("From this histogram it is difficult to tell whether the differences between
the densities before and after treatment are normally distributed. In situations
like this, we may need to draw on the experience of similar sets of
measurements.
")
```
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What is the hypothesized mean for the null hypothesis in this case?
```{r echo = FALSE}
cat("We are interested in whether there is a decrease in the microvessel
density following treatment, i.e. the difference being below 0.
The null hypothesis is that there is no difference or that the density
increases. The hypothesized mean when dealing with differences is 0.
")
```
We are interested in seeing whether there is a decrease in the bone marrow
microvessel density after treatment with a stem cell transplant.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
Is this a one-tailed or two-tailed test?
```{r echo = FALSE}
cat("One-tailed as we are only interested in a decrease.
Usually a two-sided test is preferred unless there is a strong argument for a
one-sided test. In this case our treatment is only considered to be effective
if we see a reduction in the bone marrow microvessel density after treatment.
Observing an increase in bone marrow density after treatment would lead to the
same action or conclusion as if no difference had been observed – the treatment
might be dropped from the research programme but bear in mind that the sample
size is small and so only large differences may be detected.
")
```
Now select the appropriate options in the *Statistical test* tab in order to
perform the analysis.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
What is the mean difference?
What is your value of t?
What is the p-value?
How do you interpret the p-value?
```{r}
t.test(data$Difference, mu = 0, alternative = "less")
```
```{r echo = FALSE}
cat("Under the null hypothesis, the probability of observing a t-statistic as
extreme as -1.84, is 0.06, slightly greater than 0.05, our nominal significance
level. The test result is borderline.
There is insufficient evidence to reject the null hypothesis. Therefore, we
might conclude that there is an association of a decrease in bone marrow
microvessel density after treatment with bone marrow transplant.
It is important to note the small sample size here. Studying just 7 patients
means we will only be able to detect large differences.
")
```
# Two-Sample Tests
Use the **[statistical tests app](http://bioinformatics.cruk.cam.ac.uk/stats/shinystats)**
to perform two-sample location tests.
For each exercise, select the corresponding numbered data set, e.g. '2.1
Biological process duration', in thedrop-down list on the *Data input* tab page.
The data will be shown in a table on the same tab page. Then select the *Two
sample test* tab page and either select the categorical variable (the column
that defines which group each observation is in), the two groups to compare and
the numerical variable that is being compared for the two groups, or select
two numerical columns if the data contained paired observations for the same
subject.
You can visualize the data for the two groups in the *Plots* tab, check the
assumptions for a parametric test on the *Assumptions* tab, and carry out the
test you decide is most appropriate on the *Statistical tests* tab.
## 2.1 Biological processes duration
This data set contains durations of a biological process for two samples of
wild type and knock-out cells. We are interested in seeing if there is a
difference in the durations for the two types of cells. We shall use an
**independent t-test** to compare the two cell types.
Click on the *Data input* tab from the navigation bar at the top of the page.
Then select the biological process duration data set from the drop-down list.
Each row contains one observation. The cell type is given in the *Group* column
and the duration measurement is given in the *Time* column.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What are the null and alternative hypotheses?
```{r echo = FALSE}
cat("Null hypothesis: there is no difference in the duration of the biological
process for the two cell types.
Alternative hypothesis: there is a difference in the duration of the biological
process for the two cell types.")
```
Click on the *Two sample test* tab in the navigation bar and make sure the
'Paired observations' checkbox is not selected. The app detects the group
column and numerical variable automatically.
Select the *Plots* tab to view box plots and histograms of the biological
process durations of the two groups.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
Do the data look normally distributed for each cell type?
Is the independent t-test appropriate?
What statistics are appropriate to report the location (mean or median) and
spread (standard deviation or interquartile range, IQR) of the data?
```{r}
data <- read_csv("Practical/bp_times.csv", show_col_types = FALSE)
ggplot(data, aes(Time)) +
geom_histogram(aes(y = ..density..), bins = 10, colour = "black", fill = "white") +
facet_wrap(vars(Group)) +
stat_function(fun = dnorm, color = "red", args = list(mean = mean(data$Time), sd = sd(data$Time)))
```
```{r echo = FALSE}
cat("The data do appear to be approximately normally distributed with the
histograms approximating the bell shape of a normal distribution. The
independent t-test is appropriate.
")
```
In order to apply the correct statistical test, we need to test to see if the
variances of the two groups are comparable. We can run an **F test** to compare
the two variances under the *Assumptions* tab. However, it is often easier to
eyeball the box plots of the data to decide if the variances are similar.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
What do you conclude from the p-value of the F test?
Does this agree with your impression of the variances from the box plot and
histograms?
How does it influence what two sample test to use?
```{r echo = FALSE}
cat("The test yields a p-value of 0.03568, which is sufficient evidence to
reject the null hypothesis that the variances of the two groups are the same.
Therefore we should apply Welch's correction.
")
```
Now use the appropriate two-sample t-test to compare the durations of the two
groups.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
What is your value of the test statistic?
What is the p-value?
How do you interpret the p-value?
```{r}
t.test(Time ~ Group, data = data, alternative = "two.sided", paired = FALSE, var.equal = FALSE)
```
```{r echo = FALSE}
cat("The p-value is 0.00125 which indicates there is evidence to reject the null
hypothesis.
")
```
## 2.2 Blood vessel formation
In blood plasma cancer, there is an increase in blood vessel formation in the
bone marrow. A stem cell transplant can be used as a treatment for blood plasma
cancer. The bone marrow microvessel density was measured before and after
treatment for 7 patients with blood plasma cancer.
We are interested in seeing whether there is a decrease in the bone marrow
microvessel density after treatment with a stem cell transplant. We will use a
paired two-sample t-test to compare the before and after bone marrow microvessel
densities.
Click on the *Data input* tab from the navigation bar at the top of the page.
Then select the blood vessel formation (2.2) data set from the drop-down list.
Each row contains before and after observations for a single patient - these
are paired observations.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What are the null and alternative hypotheses?
Click on the *Two sample test* tab in the navigation bar and make sure the
'Paired observations' checkbox is selected. The app detects the two numerical
variables, Before and After, automatically.
Select the *Plots* tab to view box plots and histograms of the microvessel
densities for the two groups and for the differences following treatment.
Note that it is the differences that we're interested in for these paired
observation data so it is these that need to meet the normality assumption for
a parametric t-test to be appropriate.
Check the test statistic and p-value of the test.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
How is it that these match the ones from Exercise 1.2?
## 2.3 Gene expression in breast cancer patients
A gene expression study was performed on patients categorised into positive and
negative estrogen receptor (ER) status groups. It is well known that ER positive
patients have more treatment options available and better prognosis.
The gene NIBP was measured as part of this study. We are interested in seeing if
the expression level of this gene is different between the ER positive and ER
negative patients.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What are the null and alternative hypotheses?
Now conduct an independent two-sample t-test to see if there is a difference in
expression between the two groups.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
What is the p-value from the test?
Do we achieve statistical significance at the 0.05 level?
Look closely at data distribution, calculated means for each group and the
estimated confidence interval.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
Is the finding likely to hold biological significance?
Would you be willing to put further resources into validating the finding?
## 2.4 Vitamin D levels and fibrosis
This data set contains data on vitamin D levels for subjects with ("Y") and
without ("N") fibrosis. We are interested in seeing if there is a difference
in these levels between the two groups.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
State the null and alternative hypotheses.
Examine the distribution of the data.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
Why doesn't a parametric analysis seem appropriate?
By selecting a **non-parametric test** in the *Statistical tests* tab you will
see the results of a **Wilcoxon rank-sum test**, also known as the
**Mann-Whitney U test**.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
How do you interpret the result of the test?
## 2.5 Birth weight of twins
Dr D. R. Peterson of the Department of Epidemiology, University of Washington,
collected the data on the birth weights of each of 20 dizygous twins. One twin
suffered Sudden Infant Death Syndrom (SIDS) and the other twin did not. The
hypothesis to be tested is that the SIDS child of each pair had a lower birth
weight.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
State the null and alternative hypothesises.
<span style="color:rgb(235, 7, 142)">**Questions:**</span>
Should the test be one-sided or two-sided?
Would it be appropriate to treat these as paired or independant samples?
Decide whether to run a parametric or non-parametric test.
<span style="color:rgb(235, 7, 142)">**Question:**</span>
How do you interpret the result of the test?
# Group exercises
In this section, we invite you to form small groups. Each group will be assigned
one of the exercises.
At the end of the time assigned for the exercise we will go through each of the
problems in turn and invite a representative of each group to present the
problem to the rest of the class along with the analysis (descriptive analysis,
statistical tests) the group felt was most appropriate and any conclusions made.
If time allows, groups may wish to familiarize themselves with some of the other
exercises so that they can contribute to the presentations made by other groups.
## 3.1 Plant growth
Darwin (1876) studied the growth of *pairs* of zea may (aka corn) seedlings, one
produced by cross-fertilization and the other produced by self-fertilization,
but otherwise grown under identical conditions. His goal was to demonstrate the
greater vigour of the cross-fertilized plants. The data recorded are the final
height (inches, to the nearest 1/8th) of the plants in each pair.
<span style="color:rgb(235, 7, 142)">*Is there evidence to support the hypothesis of greater growth in cross-fertilized plants?*</span>
```{r echo = FALSE}
cat("In the *Design of Experiments*, Fisher (1935) used these data to illustrate
a paired t-test (well, a one-sample test on the mean difference, cross - self).
Later in the book (section 21), he used this data to illustrate an early example
of a non-parametric permutation test, treating each paired difference as having
(randomly) either a positive or negative sign.
")
```
## 3.2 Florence Nightingale's hygiene regime
In the history of data visualization, Florence Nightingale is best remembered
for her role as a social activist and her view that statistical data, presented
in charts and diagrams, could be used as powerful arguments for medical reform.
After witnessing deplorable sanitary conditions in the Crimea, she wrote several influential texts (Nightingale, 1858, 1859), including polar-area graphs
(sometimes called "Coxcombs" or rose diagrams), showing the number of deaths in
the Crimean from battle compared to disease or preventable causes that could be
reduced by better battlefield nursing care.
Her Diagram of the Causes of Mortality in the Army in the East showed that most
of the British soldiers who died during the Crimean War died of sickness rather
than of wounds or other causes. It also showed that the death rate was higher in
the first year of the war, before a Sanitary Commissioners arrived in March 1855
to improve hygiene in the camps and hospitals.
<span style="color:rgb(235, 7, 142)">*Do the data support the claim that deaths due to avoidable causes decreased after a change in regime?*</span>
```{r eval = FALSE}
library(HistData)
data("Nightingale")
```
```{r echo = FALSE}
cat("Non-parametric
Non-paired
")
```
## 3.3 Effect of bran on diet of patients with diverticulosis
The addition of bran to the diet has been reported to benefit patients with
diverticulosis. Several different bran preparations are available, and a
clinician wants to test the efficacy of two of them on patients, since
favourable claims have been made for each. Among the consequences of
administering bran that requires testing is the transit time through the
alimentary canal. By random allocation the clinician selects two groups of
patients aged 40-64 with diverticulosis of comparable severity. Sample 1
contains 15 patients who are given treatment A, and sample 2 contains 12
patients who are given treatment B.
<span style="color:rgb(235, 7, 142)">*Does transit time differ in the two groups of patients taking these two preparations?*</span>
```{r echo = FALSE}
cat("http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/7-t-tests
Parametric
Non-paired
Equal variances
")
```
## 3.4 Effect on repetitive behaviour of autism drug
Consider a clinical investigation to assess the effectiveness of a new drug
designed to reduce repetitive behaviors in children affected with autism. If the
drug is effective, children will exhibit fewer repetitive behaviors on treatment
as compared to when they are untreated. A total of 8 children with autism enroll
in the study. Each child is observed by the study psychologist for a period of 3
hours both before treatment and then again after taking the new drug for 1 week.
The time that each child is engaged in repetitive behavior during each 3 hour
observation period is measured. Repetitive behavior is scored on a scale of 0 to
100 and scores represent the percent of the observation time in which the child
is engaged in repetitive behavior. For example, a score of 0 indicates that
during the entire observation period the child did not engage in repetitive
behavior while a score of 100 indicates that the child was constantly engaged in
repetitive behavior.
<span style="color:rgb(235, 7, 142)">*Is there statistically significant improvement in repetitive behavior after 1 week of treatment?*</span>
```{r echo = FALSE}
cat("http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Nonparametric/BS704_Nonparametric5.html
Non-parametric
Paired
")
```
## 3.5 Effect of HIV drug on CD4 cell counts
CD4 cells are carried in the blood as part of the human immune system. One of
the effects of the HIV virus is that these cells die. The count of CD4 cells is
used in determining the onset of full-blown AIDS in a patient. In this study of
the effectiveness of a new anti-viral drug on HIV, 20 HIV-positive patients had
their CD4 counts recorded and then were put on a course of treatment with this
drug. After using the drug for one year, their CD4 counts were again recorded.
<span style="color:rgb(235, 7, 142)">*Do patients taking the drug have increased CD4 counts?*</span>
```{r echo = FALSE}
cat("http://vincentarelbundock.github.io/Rdatasets/csv/boot/cd4.csv
Parametric
Paired
Equal variances
")
```
## 3.6 Drink driving and reaction times
Drunk driving is one of the main causes of car accidents. Interviews with drunk
drivers who were involved in accidents and survived revealed that one of the
main problems is that drivers do not realize that they are impaired, thinking
“I only had 1-2 drinks... I am OK to drive.”
A sample of 100 drivers was chosen, and their reaction times in an obstacle
course were measured *before* and *after* drinking two beers. The purpose of
this study was to check whether drivers are impaired after drinking two beers.
<span style="color:rgb(235, 7, 142)">*Does drinking beer alter the reaction time of the driver?*</span>
```{r echo = FALSE}
cat("http://bolt.mph.ufl.edu/6050-6052/unit-4b/module-13/paired-t-test
Parametric
Paired
non-equal variances
")
```
## 3.7 Pollution in poplar trees
Laureysens et al. (2004) measured metal content in the wood of 13 poplar clones
growing in a polluted area, once in August and once in November. Concentrations
of aluminum (in micrograms of Al per gram of wood) are shown below.
<span style="color:rgb(235, 7, 142)">*Is there any evidence for an increase in pollution between November and August?*</span>
```{r echo = FALSE}
cat("http://www.biostathandbook.com/wilcoxonsignedrank.html
Non-parametric
Paired
'The differences are somewhat skewed; the Wolterson clone, in particular,
has a much larger difference than any other clone. To be safe,
the authors analyzed the data using a Wilcoxon signed-rank test.'
")
```
## 3.8 Gender and salaries of professors
The 2008-09 nine-month academic salaries for Assistant Professors, Associate
Professors and Professors in a college in the U.S. were collected as part of the
on-going effort of the college's administration to monitor salary
differences between male and female faculty members.
<span style="color:rgb(235, 7, 142)">*Is there evidence that female professors are paid differently to their male counterparts?*</span>
```{r eval = FALSE}
library(car)
data("Salaries")
```
```{r echo = FALSE}
cat("Parametric
Non-paired
Non-equal variances
")
```
-----