-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path11-intro-to-rmarkdown.Rmd
591 lines (427 loc) · 18.3 KB
/
11-intro-to-rmarkdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
---
title: Producing Reports With knitr
teaching: 60
exercises: 30
questions:
- "How can I integrate software and reports?"
objectives:
- Understand the value of writing reproducible reports
- Learn how to recognise and compile the basic components of an R Markdown file
- Learn some basic syntax of Markdown
- Become familiar with R code chunks, and understand their purpose and structure
keypoints:
- "Mix reporting written in R Markdown with software written in R."
source: Rmd
---
```{r, include=FALSE}
source("bin/chunk-options.R")
knitr_fig_path("10-")
```
# Data analysis reports
Data analysts tend to write a lot of reports, describing their
analyses and results, for their collaborators or to document their
work for future reference.
Many new users begin by first writing a single R script containing all of the
work. Then simply share the analysis by emailing the script and various graphs
as attachments. But this can be cumbersome, requiring a lengthy discussion to
explain which attachment was which result.
Writing formal reports with Word or [LaTeX](http://www.latex-project.org/)
can simplify this by incorporating both the analysis report and output graphs
into a single document. But tweaking formatting to make figures look correct
and fix obnoxious page breaks can be tedious and lead to a lengthly "whack
a mole" game of fixing new mistakes resulting from a single formatting change.
Creating a web page (as an html file) by using R Markdown makes things easier.
The report can be one long stream, so tall figures that wouldn't ordinary fit on
one page can be kept full size and easier to read, since the reader can simply
keep scrolling. Formatting is simple and easy to modify, allowing you to spend
more time on your analyses instead of writing reports.
## Literate programming
Ideally, such analysis reports are _reproducible_ documents: If an
error is discovered, or if some additional subjects are added to the
data, you can just re-compile the report and get the new or corrected
results (versus having to reconstruct figures, paste them into
a Word document, and further hand-edit various detailed results).
The key R package is [`knitr`](http://yihui.name/knitr/). It allows you
to create a document that is a mixture of text and chunks of
code. When the document is processed by `knitr`, chunks of code will
be executed, and graphs or other results inserted into the final document.
This sort of idea has been called "literate programming".
`knitr` allows you to mix basically any sort of text with code from different programming languages, but we recommend that you use `R Markdown`, which mixes Markdown
with R. [Markdown](https://www.markdownguide.org/) is a light-weight mark-up language for documents and web pages.
## Example of Using R Markdown in the BC Government
R Markdown is quite versatile. Below are some examples of ways you can adapt it for a variety of purposes.
### Documentation
- Rendered: https://bcgov.github.io/bcdata/articles/bcdata.html
- R Markdown: https://github.com/bcgov/bcdata/blob/master/vignettes/bcdata.Rmd
### Teaching
- Rendered: https://bcgov.github.io/ds-intro-to-r-2-day/
- R Markdown: https://github.com/bcgov/ds-intro-to-r-2-day/blob/master/01-rstudio-intro.Rmd
### Presentation
- Rendered: https://bcgov.github.io/bcgov-rstats-public-presentations/2020-03-26_bcdata_lunch_and_learn/bcdata-2020-lunch-and-learn.html#1
- R Markdown: https://github.com/bcgov/bcgov-rstats-public-presentations/blob/master/2020-03-26_bcdata_lunch_and_learn/bcdata-2020-lunch-and-learn.Rmd
> ## Challenge 1 (5 minutes)
>
> Take a few minutes to discuss in your groups the typical ways in which
> you share results. What you do you do in the scenario where the data changes
> but the analysis needs to be the same?
>
>
## Creating an R Markdown file
Within RStudio, click File → New File → R Markdown and
you'll get a dialog box like this:
![](fig/New_R_Markdown.png)
You can stick with the default (HTML output), but give it a title and an author=.
## Basic components of R Markdown
The initial chunk of text (header) contains instructions for R to specify what kind of document will be created, and the options chosen. You can use the header to give your document a title, author, date, and tell it that you're going to want
to produce html output (in other words, a web page).
```
---
title: "Initial R Markdown document"
author: "Luke Skywalker"
date: "May 7th, 2020"
output: html_document
---
```
You can delete any of those fields if you don't want them
included. The double-quotes aren't strictly _necessary_ in this case.
They're mostly needed if you want to include a colon in the title.
RStudio creates the document with some example text to get you
started. Note below that there are chunks like
<pre>
```{r}
summary(cars)
```
</pre>
These are chunks of R code that will be executed by `knitr` and replaced
by their results. More on this later.
Also note the web address that's put between angle brackets (`< >`) as
well as the double-asterisks in `**Knit**`. This is
[Markdown](http://daringfireball.net/projects/markdown/syntax).
## Markdown
Markdown is a system for writing web pages by marking up the text much
as you would in an email rather than writing html code. The marked-up
text gets _converted_ to html, replacing the marks with the proper
html code.
For now, let's delete all of the stuff that's there and write a bit of
markdown.
You make things **bold** using two asterisks, like this: `**bold**`,
and you make things _italics_ by using underscores, like this:
`_italics_`.
You can make a bulleted list by writing a list with hyphens or
asterisks, like this:
```
* bold with double-asterisks
* italics with underscores
* code-type font with backticks
```
or like this:
```
- bold with double-asterisks
- italics with underscores
- code-type font with backticks
```
Each will appear as:
- bold with double-asterisks
- italics with underscores
- code-type font with backticks
You can use whatever method you prefer, but *be consistent*. This maintains the
readability of your code.
You can make a numbered list by just using numbers. You can even use the
same number over and over if you want:
```
1. bold with double-asterisks
1. italics with underscores
1. code-type font with backticks
```
This will appear as:
1. bold with double-asterisks
1. italics with underscores
1. code-type font with backticks
You can make section headers of different sizes by initiating a line
with some number of `#` symbols:
```
# Title
## Main section
### Sub-section
#### Sub-sub section
```
You _compile_ the R Markdown document to an html webpage by clicking
the "Knit" button in the upper-left.
![](fig/10-rmd-fig1.png)
> ## Challenge 2
>
> Create a new R Markdown document. Delete all of the R code chunks
> and write a bit of Markdown (some sections, some italicized
> text, and an itemized list).
>
> Convert the document to a webpage.
>
> > ## Solution to Challenge 2
> >
> > In RStudio, select File > New file > R Markdown...
> >
> > Delete the placeholder text and add the following:
> >
> > ```
> > # Introduction
> >
> > ## Background on Data
> >
> > This report uses the *gapminder* dataset, which has columns that include:
> >
> > * country
> > * continent
> > * year
> > * lifeExp
> > * pop
> > * gdpPercap
> >
> > ## Background on Methods
> >
> > ```
> >
> > Then click the 'Knit' button on the toolbar to generate an html document (webpage).
### A bit more Markdown
You can make a hyperlink like this:
`[text to show](http://the-web-page.com)`.
You can include an image file like this: `![caption](http://url/for/file)`
You can do subscripts (e.g., F~2~) with `F~2~` and superscripts (e.g.,
F^2^) with `F^2^`.
If you know how to write equations in
[LaTeX](http://www.latex-project.org/), you can use `$ $` and `$$ $$` to insert math equations, like
`$E = mc^2$` and
```
$$y = \mu + \sum_{i=1}^p \beta_i x_i + \epsilon$$
```
You can review Markdown syntax by navigating to the
"Markdown Quick Reference" under the "Help" field in the
toolbar at the top of RStudio.
### R code chunks
The real power of Markdown comes from
mixing markdown with chunks of code. This is R Markdown. When
processed, the R code will be executed; if they produce figures, the
figures will be inserted in the final document.
The main code chunks look like this:
<pre>
```{r load_data}
library("readr")
gapminder <- read_csv("data/gapminder_data.csv")
```
</pre>
That is, you place a chunk of R code between <code>```{r chunk_name}</code>
and <code>```</code>. You should give each chunk
a unique name, as they will help you to fix errors and, if any graphs are
produced, the file names are based on the name of the code chunk that
produced them.
## How R Markdown gets compiled
When you press the "Knit" button [`knitr`](http://yihui.name/knitr) will process your R Markdown file to create a plain Markdown document (along with a set of figure files if needed).
If we specified our output document to be a html file, the Markdown file (and figure documents) is then converted or rendered to a html file using the tool [`pandoc`](http://pandoc.org/).
```{r rmd_to_html_fig, fig.width=8, fig.height=3, fig.align="left", echo=FALSE}
par(mar=rep(0, 4), bty="n", cex=1.5)
plot(0, 0, type="n", xlab="", ylab="", xaxt="n", yaxt="n",
xlim=c(0, 100), ylim=c(0, 100))
xw <- 10
yh <- 35
xm <- 12
ym <- 50
rect(xm-xw/2, ym-yh/2, xm+xw/2, ym+yh/2, lwd=2)
text(xm, ym, ".Rmd")
xm <- 50
ym <- 80
rect(xm-xw/2, ym-yh/2, xm+xw/2, ym+yh/2, lwd=2)
text(xm, ym, ".md")
xm <- 50; ym <- 25
for(i in c(2, 0, -2))
rect(xm-xw/2+i, ym-yh/2+i, xm+xw/2+i, ym+yh/2+i, lwd=2,
border="black", col="white")
text(xm-2, ym-2, "figs/")
xm <- 100-12
ym <- 50
rect(xm-xw/2, ym-yh/2, xm+xw/2, ym+yh/2, lwd=2)
text(xm, ym, ".html")
arrows(22, 50, 38, 50, lwd=2, col="slateblue", len=0.1)
text((22+38)/2, 60, "knitr", col="darkslateblue", cex=1.3)
arrows(62, 50, 78, 50, lwd=2, col="slateblue", len=0.1)
text((62+78)/2, 60, "pandoc", col="darkslateblue", cex=1.3)
```
Different output document types require different tool for conversion. This will be discused below.
## Chunk options
As we saw above, code chunks are a very important part of using R Markdown. We will now explore some of the options for controlling how these will show up in our document.
Firstly we need name our code chunk name. By default all code chunks are blank. ie:
<pre>
```{r}
```
</pre>
It is recomended to use a descriptive name such as'load_data' or 'read_libraries'. This will help you break up your code chunks and ensure each chunk has a unique name.
<pre>
```{r load_libraries}
library("dplyr")
library("ggplot2")
```
</pre>
We can control how the code and the resulting outputs are shown in the final output by changing the code chunk options which follow the title. For example:
<pre>
```{r load_libraries, eval = FALSE, echo = TRUE}
library("dplyr")
library("ggplot2")
```
</pre>
There are many different options you can adjust. More details can be found on the R Markdown cheatsheet or [here](https://yihui.org/knitr/options/#code-evaluation).
The ones you need to know first are:
- `cache`: TRUE/FALSE. Do you want the output of the chunk saved so you dont have to run it next time?
- `eval`: Do you want to code to be evaluated?
- `echo`: Do you want to print the code?
- `include`: Do you want to include the code in the output document?
- `warning=FALSE` and `message=FALSE` : Do you want to hide any warnings or messages?
- `fig.height` and `fig.width` : Do you want to set figures sizes (inches)?
> Tips: Finding help with chunk options
> You can review all of the `R` chunk options by navigating to
> the "R Markdown Cheat Sheet" under the "Cheatsheets" section
> of the "Help" field in the toolbar at the top of RStudio.
>
Lets create a code chunk. Firstly add a name `addup` and set the `eval` parameter to `TRUE`.
<pre>
```{r addup, eval = TRUE}
1 + 1
```
</pre>
When we knit the document we can see both the code and the result:
```{r adds, eval = TRUE}
1 + 1
```
If we dont want the code to appear in the output we can use the `echo` parameters.
<pre>
```{r addup, eval = TRUE, echo = FALSE}
1 + 1
```
</pre>
When knit the document will now only show the result and will hide the code.
```{r addup, eval = TRUE, echo = FALSE}
1 + 1
```
### Global options
Often there will be particular options that you'll want to use
repeatedly; for this, you can set _global_ chunk options, like so:
<pre>
```{r global_options, echo=FALSE}
knitr::opts_chunk$set(fig.path="Figs/",
message=FALSE,
warning=FALSE,
echo=FALSE,
results="hide",
fig.width=11)
```
</pre>
### Formatting figures
We can use paramaters to control the location and size of plots using `fig.height`,`fig.width`
and `fig.align`. We can also use `fig.cap` to add a title.
We can use the **gapminder** dataset and `ggplot2`package to create and format a plot within R Markdown.
<pre>
```{r pretty_plot, eval = TRUE, echo = FALSE, fig.cap = "A nice descriptive title"}
library("readr")
library("ggplot2")
gapminder <- read_csv("data/gapminder_data.csv")
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
```
</pre>
```{r plot1, eval = TRUE, echo = FALSE , message=FALSE, fig.cap = "A nice descriptive title", fig.width =5, fig.height = 5}
library(readr)
library(ggplot2)
gapminder <- read_csv("data/gapminder_data.csv")
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
```
> ## Challenge 4
>
> Generate a plot using the **gapminder** dataset and `ggplot2` package. Use the chunk options to control the size and alignment of the figure.
>
> > ## Solution to Challenge 4
> >
> > <pre>
> > ```{r pretty_plot, eval = TRUE, echo = FALSE, fig.cap= "Pretty Plot", fig.width = 5, fig.height = 5, fig.align = 'centre'}
> > ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
> > ```
> > </pre>
> >
> > ```{r pretty_plot, eval = TRUE, echo = FALSE, fig.cap= "Pretty Plot", fig.width = 5,fig.height = 5, fig.align = 'centre'}
> > ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
> > ```
> >
>
## Inline R code
To make your document reproducible, you can also use r to populate values within a sentence. This will ensure values are automatically updated. We can use <code>`r</code> and <code>`</code> to designate in-line code.
For example: <code>`r round( value, 2)`</code>. The code will be
executed and replaced with the evaluated _value_ as the result.
For example:
<code>`r round(3.141593, 2)`</code> will show as
``` `r round(3.141593, 2)` ```
This is useful when incorporating data or calculations directly into a sentence:
<code> The ratio of a circle's circumference to its diameter is `r round(3.141593, 2) `</code>.
This will appear as:
``` The ratio of a circle's circumference to its diameter is `r round(3.141593, 2)` ```
We can also do calculations on the fly within our inline code. For example we can calculate the number of instructors and populate a sentence. Firstly lets use `length()` to determine the number of instructors:
<code>` r length(c("Andy", "Gen", "Sam", "Steph"))` </code>
when we run this in the R console we should get `4`.
Now we can add this as in-line code by using <code>`r</code> and <code>`</code> :
<code> this course has ` r length(c("Andy", "Gen", "Sam", "Steph"))` instructors </code>
The rendered fomat will look like
``` this course has `r length(c("Sam", "Steph", "Andy", "Gen"))` instructors ```
> ## Challenge 5
>
> Try out a bit of in-line R code using a simple addition eg: 2+2.
>
> > ## Solution to Challenge 5
> >
> > Here's some inline code to determine that 2 + 2 = `` `r 2+2` ``.
> >
>
## Other outputs: Word, PDF and more...
We can convert R Markdown to a PDF or a Word document. Click the
little triangle next to the "Knit" button to get a drop-down
menu.
Alternatively we can change the YAML:
```{r, eval = FALSE}
---
title: "Exploring R Markdown"
output: html_document
---
```
```{r, eval = FALSE}
---
title: "Exploring R Markdown"
output: word_document
---
```
```{r, eval = FALSE}
---
title: "Exploring R Markdown"
output: pdf_document
---
```
### A note about Rmd outputs
It can be easy to get caught up with how your document looks. It is highly recommended to render as html document and avoiding compiling to PDF or word until you really need to. This is also recommended by the author of `rmarkdown` and `knitr`, [Yihui Xie](https://yihui.org/en/2018/07/in-html-i-trust/). This means that you can spend time working on generating content, and not trying to get figures to line up correctly and wrestling with LaTex libraries.
> ## Tips: Creating PDF documents
>
> Markdown documents can be compiled to PDF, however it will likely you will require additional
> software called LaTex. This software can be tricky to install and it is reccommended to use
> [tinytex](https://yihui.org/tinytex/) as an alternative. This R package is a lightweight
> version of LaTex is designed for r users. You can install using the following commands:
>
>```{r install tinytex, eval = FALSE}
> tinytex::install_tinytex()
>```
> There is lots of information and help available - see [tinytex FQA page](https://yihui.org/tinytex/faq/)
>
## Resources
* [Knitr in a knutshell tutorial](http://kbroman.org/knitr_knutshell)
* [Dynamic Documents with R and knitr](http://www.amazon.com/exec/obidos/ASIN/1482203537/7210-20) (book)
* [R Markdown documentation](http://rmarkdown.rstudio.com)
* [R Markdown cheat sheet](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)
* [Getting started with R Markdown](https://www.rstudio.com/resources/webinars/getting-started-with-r-markdown/)
* [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) (book by Rstudio team)
* [Reproducible Reporting](https://www.rstudio.com/resources/webinars/reproducible-reporting/)
* [The Ecosystem of R Markdown](https://www.rstudio.com/resources/webinars/the-ecosystem-of-r-markdown/)
* [Introducing Bookdown](https://www.rstudio.com/resources/webinars/introducing-bookdown/)