-
Notifications
You must be signed in to change notification settings - Fork 0
/
tip_paths.qmd
50 lines (33 loc) · 3.12 KB
/
tip_paths.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: "Reproducible File Paths"
---
## Overview
This section contains our recommendations for handling file paths. When you code collaboratively (e.g., with GitHub), accounting for the difference between your folder structure and those of your colleagues becomes critical. **Ideally your code should be completely agnostic about (1) the operating system of the computer it is running on (i.e., Windows vs. Mac) and (2) the folder structure of the computer**. We can--fortunately--handle these two considerations relatively simply. This may seem somewhat dry but it is worth mentioning that failing to use relative file paths is a significant hindrance to reproducibility (see [Trisovic _et al._ 2022](https://www.nature.com/articles/s41597-022-01143-6)).
You may also find our [tutorial on storing user-specific information](https://lter.github.io/scicomp/tutorial_json.html) valuable in this context.
## 1. Preserve File Paths as Objects
Depending on the operating system of the computer, the slashes between folder names are different (`\` versus `/`). The `file.path` function automatically detects the computer operating system and inserts the correct slash. We recommend using this function and assigning your file path to an object.
```{r file.path_demo}
my_path <- file.path("path", "to", "my", "file")
my_path
```
Once you have that path object, you can use it everywhere you import or export information to/from the code (with another use of `file.path` to get the right type of slash!).
```{r import, eval = F}
# Import
my_raw_data <- read.csv(file = file.path(my_path, "raw_data.csv"))
# Export
write.csv(x = data_object, file = file.path(my_path, "tidy_data.csv"))
```
## 2. Create Necessary Sub-Folders in the Code
Using `file.path` guarantees that your code will work regardless of the upstream folder structure but what about the folders that you need to export or import things to/from? For example, say your `graphs.R` script saves a couple of useful exploratory graphs to the "Plots" folder, how would you guarantee that everyone running `graphs.R` *has* a "Plots folder"? You can use the `dir.create` function to create the folder in the code (and include your path object from step 1!).
```{r dir.create-demo, eval = F}
# Create needed folder
dir.create(path = file.path(my_path, "Plots"), showWarnings = FALSE)
# Then export to that folder
ggplot2::ggsave(filename = file.path(my_path, "Plots", "my_plot.png"))
```
The `showWarnings` argument of `dir.create` simply warns you if the folder you're creating already exists or not. There is no negative to "creating" a folder that already exists (nothing is overwritten!!) but the warning can be confusing so we can silence it ahead of time.
## File Paths Summary
We strongly recommend following these guidelines so that your scripts work regardless of (1) the operating system, (2) folders "upstream" of the working directory, and (3) folders within the project. This will help your code by flexible and reproducible when others are attempting to re-run your scripts!
<p align="center">
<img src="images/lter-photos/penguins.jpg" alt="Photo of two penguins swimming in icy water" width="80%"/>
</p>