-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpreserve_code.qmd
169 lines (105 loc) · 7.39 KB
/
preserve_code.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "Preserve your code"
---
## Making your code readable
```{r, echo=FALSE, fig.cap= "https://twitter.com/cjm4189/status/1557346489613094914"}
knitr::include_graphics("img/code_handover.png")
```
It is important to make your code easy to read if you hope that others will reuse it. It starts with using a **consistent style** witing your scripts (at least within a project).
- Here is a good style guide for R: <https://style.tidyverse.org/>
- Style guide for Python: <https://www.python.org/dev/peps/pep-0008/>
```{python Zen of Python, echo=TRUE}
import this
```
There is also the **visual aspect** of the code that should not be neglected. Like prose, if you receive a long text without any paragraphs, you might be not very excited about reading it. _Indentation_, _spaces_, and _empty lines_ should be leveraged to make a script visually inviting and easy to read. The good news is that most Integrated Development Environments (IDEs) will help you to do so by auto formatting your scripts according to conventions. Note that also a lot of IDEs, such as RStudio, rely on some conventions to ease the navigation of scripts and notebooks. For example, _try to add four `-` or `#` after a line starting with one or several `#` in an R Script!_
## Comments
***Real Programmers don't comment their code. If it was hard to write, it should be hard to understand.*** \
Tom Van Vleck, based on people he knew..._ (<https://multicians.org/thvv/realprogs.html>)
Joke aside, it is really hard to comment too much your code, because even steps that might seem trivial today might not be so anymore in a few weeks or months for now. In addition, well commented code is more likely to be read by others. Note also that **comments should complement the code** and should not being seen as work around vague naming conventions of variables or functions, but neither should they simple restate what the code is doing.
```{r, eval=FALSE}
x <- 9.81 # gravitational acceleration POOR
gravity_acc <- 9.81 # gravitational acceleration BETTER
name_index <- 0 # set name_index to 0 POOR
name_index <- 0 BETTER
```
### Header
It is good to add a header to your script that will provide basic information such as:
- Purpose of the script (Long title style)
- Who are the authors
- A contact email
- How to run, if the code is a program
Optional:
- A longer description about the script purpose
- A starting date and potentially last updated one, although this information becomes redundant with repository information
Note that R Studio does something similar by default when creating a new R Markdown document!
### Inline
It does not matter if you are using a script or notebook. It is important to provide comments along your code to complement it by:
- Explaining what the code does
- Capturing decisions that were made on the analytical side. For example, why a specific value was used for a threshold.
- Specifying why and when code was added to handle an edge case such as an unexpected value in the data (so a new user doesn't have to guess what the code does and might want to delete it assuming it is not necessary)
Other thoughts:
- It is OK to state (what seems) the obvious (some might disagree with this statement)
- Try to keep comments to the point and short
### Functions
Both Python and R have conventions on how to document functions. Adopting those conventions will help you to make your code readable but also to automate part of the documentation development.
#### Roxygen2
The goal of roxygen2 is to make documenting your code as easy as possible. It can dynamically inspect the objects that it’s documenting, so it can automatically add data that you’d otherwise have to write by hand.
How do we insert it? Make sure you cursor is inside the function you want to document and from RStudio Menu _Code -> Insert Roxygen Skeleton_
Example:
```{r}
#' Add together two numbers
#'
#' @param x A number
#' @param y A number
#' @return The sum of \code{x} and \code{y}
#' @examples
#' add(1, 1)
#' add(10, 1)
add2 <- function(x, y) {
x + y
}
```
Try it!
- Copy the function (without the documentation) in a new script
- Add a third parameter to the function such as it sums 3 numbers
- Add the Roxygen skeleton
- Fill it to best describe your function
Note that when you are developing an R package, the Roxygen skeleton can be leveraged to develop the help pages of your package so you only have one place to update and the help will synchronize automatically.
#### Python Docstring
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the `__doc__` special attribute of that object.
```{python, eval=FALSE}
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0:
return complex_zero
```
Go here for more information: <https://www.python.org/dev/peps/pep-0257/>
## Leveraging Notebooks
We have focused on and been experimenting with Notebooks during the week because they provide space to further develop content, such as methodology, around the code you are developing in your analysis. Notebooks also enable you to integrate the outputs of your scientific research with the code that was used to produce it. Finally, notebooks can be rendered into various formats that allows them to be shared with a broad audience.
Notebooks are not only used within the scientific community, see [here](https://peerj.com/preprints/3182.pdf) for some thoughts from Airbnb data science team.
---
## Hands-on
### Documenting
```{r, eval=FALSE}
getPercent <- function( value, pct ) {
result <- value * ( pct / 100 )
return( result )
}
```
Try adding the Roxygen Skeleton to this function and fill all the information you think is necessary to document the function
### Commenting
Let's try to improve the readability and documentation of this repository: <https://github.com/brunj7/better-comments>. Follow the instructions on the README
For inspiration, you can check out the NASA code for APOLLO 11 dating from 1969: <https://github.com/chrislgarry/Apollo-11>!!
---
## Code repositories
On-line code repositories are a great way to version and share your code. Here are a few examples of git-based code repositories:
- GitHub
- GitLab
- Bitbucket
- SourceForge
Note however that there is no long-term commitment of any of those main code repositories and that archiving the specific snapshot of your code that was used for a specific analysis along your data is a great idea. Several data repositories offer an integration that lets you do that with data repositories. For example, Zenodo has a great integration with GitHub that lets you issue a DOI for a specific release (read snapshot) of your repository and preserve it independently from the code repository. See [here](https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_2.md) for more details.
Note that code repositories and data repositories complement each other: you can see your code repository as the live version of your work and the code snapshot archive as the historical trace that was produced for a specific analysis. In other words, we recommend linking both the code repository and its snapshot to the data archive.